Lumin Bo1, Hongyu Fu1, Junchi Yang2. 1. Department of Gastroenterology, Changhai Hospital, Shanghai 200433, P.R. China. 2. Department of Gastrointestinal Surgery, Changhai Hospital, Shanghai 200433, P.R. China.
Abstract
Crohn's disease (CD) is a type of inflammatory bowel disease that cannot be fully cured by medication or surgery. In the present study, the aim was to understand the underlying mechanisms of CD. Two CD microarray datasets were downloaded from The Gene Expression Omnibus database: GSE36807 (13 CD and 7 normal samples) and GSE59071 (8 CD and 11 normal samples). A series of bioinformatics analyses were conducted, including weighted gene co‑expression network analysis to identify stable modules, and analysis of differentially expressed genes (DEGs) between CD and normal samples. The common DEGs in the GSE36807 and GSE59071 datasets were screened. Subsequently, overlapping genes in the stable modules and the DEGs were selected to construct a protein‑protein interaction (PPI) network using Cytoscape software. Enrichment analysis of genes in the network was performed to explore their biological functions. A total of 10 stable modules and 927 DEGs were identified, of which 234 genes were shared in the stable modules and the DEGs. After removal of 32 uncharacterized genes, 202 genes were selected to build the PPI network. Low density lipoprotein receptor (LDLR), toll‑like receptor 2 (TLR2), lipoprotein lipase (LPL), forkhead box protein M1 (FOXM1) and neuropeptide Y (NPY) were revealed as key nodes with high degree. Pathway enrichment analysis demonstrated that LPL was enriched in the peroxisome proliferator‑activated receptor (PPAR) signaling pathway. In conclusion, LDLR, TLR2, FOXM1 and NPY, as well as LPL in the PPAR signaling pathway may serve critical roles in the pathogenesis of CD.
Crohn's disease (CD) is a type of inflammatory bowel disease that cannot be fully cured by medication or surgery. In the present study, the aim was to understand the underlying mechanisms of CD. Two CD microarray datasets were downloaded from The Gene Expression Omnibus database: GSE36807 (13 CD and 7 normal samples) and GSE59071 (8 CD and 11 normal samples). A series of bioinformatics analyses were conducted, including weighted gene co‑expression network analysis to identify stable modules, and analysis of differentially expressed genes (DEGs) between CD and normal samples. The common DEGs in the GSE36807 and GSE59071 datasets were screened. Subsequently, overlapping genes in the stable modules and the DEGs were selected to construct a protein‑protein interaction (PPI) network using Cytoscape software. Enrichment analysis of genes in the network was performed to explore their biological functions. A total of 10 stable modules and 927 DEGs were identified, of which 234 genes were shared in the stable modules and the DEGs. After removal of 32 uncharacterized genes, 202 genes were selected to build the PPI network. Low density lipoprotein receptor (LDLR), toll‑like receptor 2 (TLR2), lipoprotein lipase (LPL), forkhead box protein M1 (FOXM1) and neuropeptide Y (NPY) were revealed as key nodes with high degree. Pathway enrichment analysis demonstrated that LPL was enriched in the peroxisome proliferator‑activated receptor (PPAR) signaling pathway. In conclusion, LDLR, TLR2, FOXM1 and NPY, as well as LPL in the PPAR signaling pathway may serve critical roles in the pathogenesis of CD.
Crohn's disease (CD) is a type of inflammatory bowel disease (IBD) that can affect any part of the gastrointestinal tract (1). This disease is caused by immune, bacterial and environmental factors in individuals with genetic susceptibility (2–4). Patients with CD usually suffer from weight loss, abdominal pain, fever and diarrhea (5); other complications associated with CD include arthritis, anemia, inflammation of the eye, fatigue and skin problems (6). CD is a common disease in the developed world, with increasing incidence and prevalence in developing countries (7,8). At present, there are no medical treatments or operative interventions that can fully cure CD, and in some cases surgery may even result in recurrence of the disease (9). Therefore, the underlying mechanisms of CD need to be explored in order to develop novel treatment strategies.So far, some progress has been made with understanding the pathogenesis of CD. For example, it was demonstrated that mutation of the autophagy-related 16-like 1 gene is associated with CD, and may possibly support the role of autophagy in the development and progression of IBD (10). T helper (Th) 17 cells are a unique Th cell lineage that serve an important role in inflammatory diseases by secreting proinflammatory cytokines, including interleukin 17 (IL-17) and IL-23, and may mediate the Th1/Th17 imbalance seen in CD and ulcerative colitis (11,12). A previous study reported that signal transducer and activator of transcription 3 (STAT3) and Janus kinase 2 (JAK2), which are involved in the STAT-JAK pathway, can increase the risk of CD (13). Nucleotide-binding oligomerization domain-containing 2 (NOD2) is expressed throughout the small intestine, with highest expression in the terminal ileum where there is an abundance of Paneth cells, and is also closely associated with pathogenesis of CD (14,15). Despite these profound findings, pathogenesis of CD has not been fully elucidated.Bioinformatics analysis is a useful tool for exploring disease pathogenesis and screening novel therapeutic targets (16). Kenny et al performed a genome-wide association study in an Ashkenazi Jewish population with CD and revealed that 16 replicated and novel loci made up 11.2% of the total genetic variations associated with CD risk (17). Fransen et al employed expression quantitative trait loci to select single nucleotide polymorphisms (SNPs) for follow-up and identified two CD-associated SNPs in the ubiquitin-conjugating enzyme E2L 3 and B-cell chronic lymphocytic leukemia/lymphoma 3 genes (18). These findings provide some information on the etiology of CD; however, to the best of our knowledge the mechanisms of CD pathogenesis have not yet been investigated by comprehensive bioinformatics analysis.In the present study, comprehensive bioinformatics analysis was carried out to understand the pathogenesis of CD and identify novel therapeutic targets. The gene expression profiles of patients with CD were downloaded from a public database, and subsequently a series of bioinformatics analyses were performed, including weighted gene co-expression network analysis (WGCNA), meta-analysis, protein-protein interaction (PPI) network analysis and enrichment analysis for key genes associated with CD. This study contributes towards the further understanding of the mechanisms underlying human CD.
Materials and methods
Data source
The key words ‘Crohn's disease’ and ‘Homo sapiens’ were used to search relevant gene expression profile data on the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The inclusion criteria were as follows: i) Gene expression profiles; ii) intestinal tissues from patients with CD (not cells); iii) availability of both CD and normal samples; iv) human samples and v) number of total samples ≥18. Based on these criteria, two microarray datasets were selected: GSE36807 (Platform GPL570; 13 CD and 7 normal samples) (19) and GSE59071 (Platform GPL6244; 8 CD and 11 normal samples) (20).
Data pre-processing
The raw data (CEL files) were downloaded from GEO. Subsequently, the oligo package version 1.41.1 from R (http://www.bioconductor.org/packages/release/bioc/html/oligo.html) (21) was utilized to perform pre-treatments, including conversion of data format, summarization using median-polish, background correction using the MAS method and data normalization by quantile method.
WGCNA analysis
WGCNA is a popular systems biology tool used to construct gene co-expression networks, which can be used to detect disease-associated gene clusters and identify therapeutic targets (22). GSE36807 and GSE59071 were used as the training and validation sets, respectively. Stable gene modules associated with CD were screened using the R WGCNA package version 1.61 (https://cran.r-project.org/web/packages/WGCNA/index.html) (22), with the following parameters; cutHeight=0.95 and ≥30 genes/module; the correlation coefficients (CC) were also calculated.
Meta-analysis
Using the MetaDE.ES function in the R MetaDE package (https://cran.r-project.org/src/contrib/Archive/MetaDE/) (23), differentially expressed genes (DEGs) between CD and normal samples that were common to the two datasets were identified. To ensure homogeneity of genes, tau2=0, Qpval >0.05 and false discovery rate <0.05 were used as the cut-off criteria.
Construction of the PPI network
Genes in the stable modules and DEGs were compared, and overlapping genes were used as candidate genes for construction of the PPI network. An integrated PPI network was generated from the candidate genes by merging numerous PPI databases, including BioGRID version 3.4.153 (http://thebiogrid.org/) (24), Human Protein Reference Database release 9 (http://www.hprd.org/) (25) and STRING version 10.5 (https://string-db.org/) (26). The PPI network was visualized using the Cytoscape software version 3.3.0 (http://www.cytoscape.org/) (27).
Functional and pathway enrichment analyses
The Database for Annotation, Visualization and Integrated Discovery (DAVID) version 6.8 (https://david.ncifcrf.gov/) is an easy-to-use web tool in which genes are annotated and summarized according to shared categorical data (28). Based on DAVID analysis, Gene Ontology (GO) (29) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) (30) pathway enrichment analysis were conducted for the nodes (a node represents a gene) identified in the PPI network, with the criterion of P<0.05.
Results
To ensure the GSE36807 and GSE59071 datasets were comparable, correlation of gene expression and connectivity of common genes was analyzed. The results revealed that the CC for gene expression and connectivity between the two datasets were 0.82 (P<1×10−200; Fig. 1A) and 0.51 (P<1×10−200; Fig. 1B), respectively. Therefore, it was deduced that the GSE36807 and GSE59071 datasets were comparable.
Figure 1.
Correlation between (A) gene expression and (B) gene connectivity in the GSE36807 and GSE59071 datasets. Cor, correlation.
To meet the prerequisite of scale-free network distribution, the value of the adjacency matrix weighting parameter ‘power’ was explored. After setting the ranges of parameters for the network, the scale-free topology matrix was calculated. Using GSE36807 as the essential data, the scale-free topology model fit was calculated and statistics were selected for graphing (Fig. 2A). The greater the value of CC2, the closer the network was to scale-free distribution. Based on this theory, the mean connectivity of genes was calculated using power=8 as this was the minimum power value to achieve R2=0.9. The results demonstrated that the mean connectivity was 16, which conforms to the node connection properties of a scale-free network (Fig. 2B).
Figure 2.
(A) Selection of the weighting parameter ‘power’. The red line indicates when the square of correlation coefficient equals 0.9. (B) Mean connectivity of genes according to ‘power’. The red line indicates a mean connectivity of 16 when the ‘power’ is 8.
Using GSE36807 as the training set, disease-associated modules were screened. Variation in gene expression in the samples was calculated, and the genes with variation coefficients >0.1 were selected. Subsequently, the community dissimilarities among the genes were calculated, and a clustering tree was obtained. A total of 10 modules (D1M1, D1M2, D1M3, D1M4, D1M5, D1M6, D1M7, D1M8, D1M9 and D1M10) were identified using cutHeight=0.95 (Fig. 3A). Similarly, module division was performed for GSE59071 (Fig. 3B) and the modules in this dataset were used to evaluate the stability of those identified in the training set.
Figure 3.
Module partition trees of (A) GSE36807 and (B) GSE59071; each color represents a different module.
The correlation of gene expression was analyzed for the same color modules in the two datasets, and the results demonstrated that 10 modules were stable (Table I; preservation Z-score >5; P≤0.05). According to the expression similarities of the module genes, the correlation of the modules in GSE36807 (Fig. 4A) and GSE59071 (Fig. 4B) were analyzed (Table I). By combining the correlations among the modules with the GO categories for genes in the stable modules (Table I), a network for the modules was constructed (Fig. 4C). This revealed that genes in three stable modules (M1, M2 and M8) were significantly associated with cell adhesion and genes in two stable modules (M4 and M10) were significantly associated with immune response.
Table I.
Correlation of gene expression for the same color modules in GSE36807 and GSE59071.
Modules
Module preservation
GSE36807
GSE59071
Color
Size
Correlation
P-value
Z-score
Module characterization
D1M1
D2M1
Black
163
0.40
1.2×10−7
20.34
Cell adhesion
D1M2
D2M2
Blue
431
0.37
2.0×10−15
10.05
Cell adhesion
D1M3
D2M3
Brown
412
0.30
5.1×10−10
15.88
Cell-cell signaling
D1M4
D2M4
Green
180
0.50
8.9×10−13
16.73
Immune response
D1M5
D2M5
Grey
368
0.35
4.8×10−12
4.12
–
D1M6
D2M6
Magenta
38
0.32
5.0×10−2
11.74
Cell cycle
D1M7
D2M7
Pink
123
0.41
2.5×10−6
5.53
Regulation of transcription
D1M8
D2M8
Red
177
0.35
1.8×10−6
21.44
Cell adhesion
D1M9
D2M9
Turquoise
944
0.68
4.2×10−129
17.93
Ion transport
D1M10
D2M10
Yellow
352
0.42
1.8×10−16
29.51
Immune response
Size is the number of genes in one module. Module characterization is the significant functional term for each module. 510 indicate stable and highly stable, respectively. D1, Dataset1 GSE36807; D2, Dataset2 GSE59071; M, module.
Figure 4.
Correlation cluster diagrams for the modules of (A) GSE36807 and (B) GSE59071. (C) Module-module correlation network. Single lines connect modules within the same dataset and double lines connect modules across datasets. The thickness of the links indicates the extent of correlation. D1, dataset 1 GSE36807; D2 dataset 2 GSE59071; M, module.
After calculation of the parameter values of each gene, a total of 927 DEGs were screened according to the aforementioned thresholds. The heatmap revealed that changes in expression of the DEGs were similar for GSE36807 and GSE59071 (Fig. 5).
Figure 5.
Clustering heatmap of differentially expressed genes.
A total of 234 overlapping genes from genes in the stable modules (3,188 genes) and the 927 DEGs were identified and used as candidate genes (Fig. 6A). The number and proportion of these genes in different modules are shown in Fig. 6B. A total of 32 uncharacterized genes in the grey module were removed from subsequent analyses. PPIs were predicted for the remaining 202 genes and a PPI network (123 nodes and 270 edges) was constructed (Fig. 7). Notably, low density lipoprotein receptor (LDLR; node degree=22), toll-like receptor 2 (TLR2; node degree=19), lipoprotein lipase (LPL; node degree=18), forkhead box M1 (FOXM1; node degree=17) and neuropeptide Y (NPY; node degree=16) were the top five nodes with high degrees in the PPI network.
Figure 6.
(A) Venn diagram depicting the overlap between the 3,188 genes in the stable modules and the 927 differentially expressed genes. (B) Pie chart of the number of genes in different modules. WCGNA, weighted gene co-expression network analysis.
Figure 7.
Protein-protein interaction network. Upright red and inverted green triangles represent upregulated and downregulated proteins, respectively. The color intensity indicates gene expression. The enlarged nodes are the top five nodes with highest degrees. FC, fold change.
A total of 24 GO categories and 12 KEGG pathways were enriched for the PPI network nodes. Notably, the network nodes were mainly implicated in the enzyme-linked receptor protein signaling pathway (GO category; P=0.00185), cell-cell signaling (GO category; P=0.003505), peroxisome proliferator-activated receptor (PPAR) signaling pathway (KEGG pathway; P=0.001014) and cytokine-cytokine receptor interaction (KEGG pathway; P=0.004187) (Fig. 8).
Figure 8.
(A) Functional categories and (B) pathways enriched for the protein-protein interaction network nodes. BP, biological process; CC, cellular component; GO, Gene Ontology; MAPK, mitogen-activated protein kinase; MF, molecular function; PPAR, peroxisome proliferator-activated receptor.
Discussion
In the present study, a total of 10 stable CD-associated modules were identified using WGCCA; of which three stable modules (M1, M2 and M8) and two stable modules (M4 and M10) were significantly associated with cell adhesion and immune response, respectively. In addition, 927 overlapping DEGs in the GSE36807 and GSE59071 datasets were screened. In the PPI network, five important nodes with relatively high node degrees were identified as LDLR, TLR2, LPL, FOXM1 and NPY.Previous studies have investigated the relationship between inflammatory cytokines and LDLR (31,32), and demonstrated that the inflammatory cytokines IL-1β and tumor necrosis factor (TNF)-α can regulate LDLR expression leading to accumulation of high concentrations of low-density lipoprotein (33). FOXM1 belongs to the forkhead box (FOX) family of transcription factors, and regulates key genes involved in goblet cell metaplasia and lung inflammation (34,35). FOXM1 is an important mediator of cell proliferation, and its expression is enhanced during gastric carcinogenesis induced by Helicobacter pylori infection (36). Overexpression of triglyceride-rich lipoproteins can induce lipid accumulation and early inflammatory responses, and LPL serves an anti-inflammatory role by hydrolyzing triglyceride-rich lipoproteins (37). PPARs can regulate lipid metabolism-associated genes and control inflammation, and PPARα and -γ activators contribute to macrophage secretion of LPL (38,39). At present, there is no direct evidence for the involvement of LDLR, FOXM1 and LPL in CD development. However, their expression levels may be linked to disease progression. A study demonstrated that polymorphisms of LIGHT (a TNF superfamily member) are associated with CD, and inhibiting LIGHT reduces dyslypidemia in mice lacking LDLR (40). This suggests a possible association between LDLR and CD. In IBD, the susceptibility genes include FYN proto-oncogene and HCK proto-oncogene, which are involved in the phosphoinositide 3-kinase signaling network, which also contains FOXM1 (41). Since CD is a major subtype of IBD, FOXM1 may also be associated with CD. With regards to the LPL gene, decreased expression is observed in CD adipose-derived stem cells (ASCs) independent of clinical stage, compared with in healthy ASCs, indicating that CD modifies the plasticity of mesenteric ASCs (42). In the present study, LPL was upregulated in CD PPI network, indicating a potential role of LPL in the progression of CD; pathway enrichment analysis revealed that LPL was enriched in the PPAR signaling pathway. Interestingly, the alterations in LPL expression levels of ASC cells and CD tissues revealed different trends, suggesting that the creeping fat tissue may be associated with the immunomodulatory properties in patients with CD. Overall, the results demonstrated that increases in LDLR, FOXM1 and LPL may be highly relevant to the development of CD.TLRs initiate immune responses to microbial infection, with TLR2 recognizing bacterial lipoproteins, zymosan, peptidoglycan and lipoteichoic acids (43,44). TLR2, TLR4, and their transmembrane co-receptor cluster of differentiation 14, are all upregulated in the terminal ileum of patients with IBD, and their dysregulation may serve critical roles in the pathogenesis of IBD (45). NOD2 mediates the anti-inflammatory cytokine bias induced by TLR2 ligands; therefore, defective NOD2 function may promote inflammation and increase the disease risk of CD (46). TLR2 and TLR4 have been reported to be overexpressed in the inflamed colonic mucosa of children with IBD, indicating that innate immunity is associated with the pathogenesis of IBD (47). A previous study discovered that TLR2 and TLR4 are highly expressed on inflammation-associated intestinal macrophages (IMACs), leading to a higher reactivity to lipopolysaccharide and possibly CD (48). In addition, the role of Gp96, an endoplasmic reticulum chaperone for multiple protein substrates, in CD has been investigated. The results demonstrated that the lack of Gp96 in CD IMACs is associated with loss of tolerance against the host gut flora, and that the Gp96 knockdown cell line has decreased TLR2 and TLR4 expression (49). Therefore, it may be hypothesized that TLR2 has an important role in the pathogenesis of CD. Neurogenic inflammation is critical for the development of IBD, and NPY induces oxidative stress and colitis by regulating the expression of neuronal nitric oxide synthase (50). NPY and the NPY receptor Y1 (NPY1R) are associated with intestinal inflammation and suppression of NPY1R signaling may represent a potential therapeutic strategy for colonic inflammation (51,52). NPY peptides have been implicated in various gastrointestinal disorders, including IBD, malabsorption and short gut; and NPY receptor agonists and antagonists can be used for preventing diarrhea and intestinal inflammation (53). Gut neurohormones, including NPY, can affect inflammation by interacting with the immune system, and NPY represents a promising target for treating IBD (54). In the present study, NPY was reported as an important node with a high degree in the CD PPI network; however, the expression levels of NPY were lower in the CD samples of patients compared with in the healthy control, which is inconsistent to the reports (50,52). Considering the important role of NPY in CD and that its expression levels varies from that of previous reports, further validation of NPY mRNA and protein expression in clinical samples is required.Nevertheless, there are some limitations to the present study. Firstly, the results are solely based on bioinformatics analysis and although various important genes for CD etiology were identified, experimental validation is required. Secondly, only two CD microarray datasets were selected with relatively small sample sizes; therefore, this could affect the robustness of the results. Further experiments will be designed and conducted to validate these findings in follow-up studies.In conclusion, a total of 10 stable modules and 927 DEGs associated with CD were identified. Notably, LDLR, TLR2, FOXM1, NPY and LPL may be the key genes involved in pathogenesis of the disease. LPL may exert its effects through the PPAR signaling pathway.
Authors: Rose Oughtred; Andrew Chatr-aryamontri; Bobby-Joe Breitkreutz; Christie S Chang; Jennifer M Rust; Chandra L Theesfeld; Sven Heinicke; Ashton Breitkreutz; Daici Chen; Jodi Hirschman; Nadine Kolas; Michael S Livstone; Julie Nixon; Lara O'Donnell; Lindsay Ramage; Andrew Winter; Teresa Reguly; Adnane Sellam; Chris Stark; Lorrie Boucher; Kara Dolinski; Mike Tyers Journal: Cold Spring Harb Protoc Date: 2016-01-04
Authors: Mihai G Netea; Bart Jan Kullberg; Dirk J de Jong; Barbara Franke; Tom Sprong; Ton H J Naber; Joost P H Drenth; Jos W M Van der Meer Journal: Eur J Immunol Date: 2004-07 Impact factor: 5.532
Authors: Rasa Elmentaite; Alexander D B Ross; Kenny Roberts; Kylie R James; Daniel Ortmann; Tomás Gomes; Komal Nayak; Liz Tuck; Sophie Pritchard; Omer Ali Bayraktar; Robert Heuschkel; Ludovic Vallier; Sarah A Teichmann; Matthias Zilbauer Journal: Dev Cell Date: 2020-12-07 Impact factor: 12.270