Literature DB >> 34522169

Identification of Key Genes and Pathways associated with Endometriosis by Weighted Gene Co-expression Network Analysis.

Jingni Wu1,2, Xiaoling Fang1, Xiaomeng Xia1.   

Abstract

Background: Endometriosis is a common gynecological disorder with high rates of infertility and pelvic pain. However, its pathogenesis and diagnostic biomarkers remain unclear. This study aimed to elucidate potential hub genes and key pathways associated with endometriosis in ectopic endometrium (EC) and eutopic endometrium (EU). Material and Method: EC and EU-associated microarray datasets were obtained from the gene expression omnibus (GEO) database. Gene set enrichment analysis was performed to obtain further biological insight into the EU and EC-associated genes. Weighted gene co-expression network analysis (WGCNA) was performed to find clinically significant modules of highly-correlated genes. The hub genes that belong to both the weighted gene co-expression network and protein-protein interaction (PPI) network were identified using a Venn diagram.
Results: We obtained EC and EU-associated microarray datasets GSE7305 and GSE120103. Genes in the EC were mainly enriched in the immune response and immune cell trafficking, and genes in the EU were mainly enriched in stress response and steroid hormone biosynthesis. PPI networks and weighted gene co-expression networks were constructed. An EC-associated blue module and an EU-associated magenta module were identified, and their function annotations revealed that hormone receptor signaling or inflammatory microenvironments may promote EU passing through the oviducts and migrating to the ovarian surfaces, and adhesion and immune correlated genes may induce the successful ectopic implantation of the endometrium (EC). Twelve hub genes in the EC and sixteen hub genes in the EU were recognized and further validated in independent datasets.
Conclusion: Our study identified, for the first time, the hub genes and enrichment pathways in the EC and EU using WGCNA, which may provide a comprehensive understanding of the pathogenesis of endometriosis and have important clinical implications for the treatment and diagnosis of endometriosis. © The author(s).

Entities:  

Keywords:  WGCNA; biological markers; endometriosis; pathway

Mesh:

Year:  2021        PMID: 34522169      PMCID: PMC8436105          DOI: 10.7150/ijms.63541

Source DB:  PubMed          Journal:  Int J Med Sci        ISSN: 1449-1907            Impact factor:   3.738


Introduction

Endometriosis is an estrogen-dependent gynecological disorder characterized by the growth of endometrium in ectopic locations 1. A total of 30-50% of women with endometriosis suffer from pain and/or unexplained infertility 2. Despite several theories (i.e., retrograde menstruation, coelomic metaplasia, Müllerian remnants) that have been proposed, the pathogenesis of endometriosis is still unknown. The widely accepted retrograde menstrual reflux hypothesis states that eutopic endometrium (EU) migrates and survives outside the cavity of uterus, then establishes new endometriosis lesions. It is believed that the elucidation of molecular and functional specificities of the ectopic endometrium (EC) and EU facilitates a better understanding of the complex physiopathology of endometriosis. Previous studies have shown that the EC may behave differently from its eutopic counterpart 3. However, these studies mainly focused on a few molecules or the gene expression differences between different tissues, without considering the intrinsic relationship between these genes. In addition, there is still room for improvement of the bioinformatics algorithm in analyzing these transcriptomic data, and the specific biomarkers and roles of the EC and EU in endometriosis remain uncertain. Therefore, our study for the first time explored the genomic alteration profiles of the two entities of endometriosis using weighted gene co-expression network analysis (WGCNA) to identify endometriosis-associated biomarkers and pathways. Currently, there is no standard protocol for analyzing transcriptomic data. Network analysis is a promising direction that allows for a greater ability to recognize biological themes or pathways. It combines biology and network science to study the relationships of interacting components, which may provide novel and comprehensive insights into the diseases from the level of multiple genes 4. WGCNA is a network method for identifying highly correlated gene expression modules in different samples and analyzing the correlation between the module and disease type/clinical phenotype. Hence, WGCNA has been widely used to explore the biomarkers and therapeutic targets of various diseases, such as breast cancer 5. WGCNA has also been used to identify biologically related modules. For example, Wang et al. found fifteen hub genes that were highly correlated with the progression and prognosis of clear cell renal cell carcinoma using WGCNA 6. As a result, using WGCNA, we attempted to identify the modules of co-expressed genes highly associated with endometriosis and their key drivers. Meanwhile, we tried to explore the key pathways of the EC and EU in the pathogenesis of endometriosis. Our study may provide a better understanding of the disorder through the comparison between the eutopic and ectopic endometrium, and provide a new insight into the molecular mechanisms underlying the pathogenesis of endometriosis.

Materials and methods

Study design

To illustrate the data preprocessing, analysis and validation, a schematic flow diagram of the study is presented in Figure 1.
Figure 1

Flow diagram of strategy for data preparation, preprocessing, and analysis.

Data acquisition

Expression profiles of endometriosis-associated mRNAs in GSE7305, GSE120103, GSE7307 and GSE51981 were downloaded from Gene Expression Omnibus (GEO) database. The microarray datasets GSE7305 and GSE120103 with complete clinical information and same menstrual cycle were used as training sets to identify hub differentially expressed genes (DEGs) of endometriosis, GSE7307 and GSE51981 were used as test sets to validate our results, respectively. Dataset GSE7305 7 performed on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) was used to recognize hub DEGs in ovarian endometrioma, which includes 10 ovarian endometriomas from women with endometriosis (EC) and 10 normal endometria (Ctrl). Dataset GSE120103 8 performed on the GPL6480 platform (Agilent-014850 Whole Human Genome Microarray 4x44K G4112F) was applied to identify hub DEGs in eutopic endometrium, which includes 9 eutopic endometria from fertile women with endometriosis (EU) and 9 Ctrl. Dataset GSE7307 performed on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) was used to validate EC-associated hub DEGs, which includes 23 EC and 18 Ctrl. And Dataset GSE51981 performed on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) was used to validate EC-associated hub DEGs, which includes 38 EU and 71 Ctrl.

Data preprocessing and differentially expressed genes (DEGs) identification

Limma (linear models for microarray data) package 9,10 in R software was utilized to correct the data background and identify the DEGs in EC vs Ctrl and EU vs Ctrl groups. Batch effect was removed using the limma package removeBatchEffect function. Benjamin and Hochberg method was used for multiple testing corrections 11. The false discovery rate (FDR) <0.05 and |log2 (Fold Change)| (|log2FC|) ≥1 was the cut-off criteria for screening DEGs.

Gene-set enrichment analysis (GSEA)

To evaluate the molecular mechanisms of endometriosis, GSEA of the gene expression profiles of GSE7305 and GSE120103 was performed using the “ClusterProfiler” package in R (http://www.bioconductor.org/packages/release/bioc/html/clusterProfiler.html). The genes were listed based on their expression levels, and were further mapped to the annotated gene sets of c5 (Gene Ontology (GO) gene sets) and c2 (curated gene sets) in Molecular Signatures Database (MSigDB). Gene sets with P-value <0.05 and FDR <25% are considered as significant 12.

Protein-protein interaction (PPI) network construction

The search tool for retrieval of interacting genes (STRING) database was used to identify the interactions among DEGs with the parameters of protein interaction score>0.4. Thereafter, the PPI network is constructed by Cytoscape. The potential hub DEGs were determined by Molecular Complex Detection (MCODE) plug-in (K-score>3) 13.

Weighted gene co-expression network construction

We used the WGCNA R package to establish co-expression networks 14 for the genes in GSE7305 and GSE120103. The unqualified genes were screened out. A matrix of genes' similarity by Pearson's correlation analysis was created. Appropriate soft threshold power (β) was applied to strengthen this matrix to a scale-free co-expression network. For this purpose, we choose the lowest power (14 or 22) for which the scale-free topology fit index curve flattens out upon reaching a high value (above 0.8). Furthermore, the adjacency matrix was transformed into the topological overlap matrix (TOM). Genes with higher TOM values indicate higher connectivities in the network; that is, more adjacencies to other network-generated genes 15,16. Meanwhile, genes were clustered hierarchically by the TOM-based dissimilarity (1-TOM) measure. The highly correlated genes were assigned to the same module.

Clinically significant module identification and function analysis

The correlation between modules and clinical traits was investigated by the module-trait relationship analysis of WGCNA. The modules that most relevant to the clinical traits could be identified. In this study, the endometriosis-associated blue and magenta modules were chosen for the subsequent analyses. Metascape was used to explore the function annotations (GO biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways) of these two modules. Terms with P-value<0.01, count≥3 and an enrichment factor>1.5 were considered statistically significant.

Hub genes identification and verification

We analyzed the gene significance (GS, the correlation between the gene and a clinical phenotype of interest) and module membership (MM, the correlation between gene expression profile and module eigengene) of each gene in the clinically significant blue and magenta modules. The module eigengene is defined as the main component of the module's gene expression matrix. |MM|>0.6 and |GS|>0.8 were set as the threshold for screening candidate hub genes that strongly associated with EC or EU. In the end, the Venn diagram was performed to identify the common hub genes from PPI network analysis and WGCNA. In addition, GSE7307 and GSE51981 were used as validation data sets. “ggplot2” (Ito & Murphy, 2013) R package was applied to show relative expression of the identified hub genes in different comparison groups (EC vs Ctrl or EU vs Ctrl).

Results

Identification of differentially expressed genes

With the cut-off criteria (FDR<0.05 and |log2FC|>1), a total of 1487 DEGs (824 upregulated and 663 downregulated) were identified between the EC and Ctrl in the GSE7305 dataset, and a total of 5794 DEGs (1974 upregulated and 3820 downregulated) were identified between the EU and Ctrl in the GSE120103 dataset. Volcano plots show the variation of DEGs in the EC versus Ctrl (Figure 2A) and EU versus Ctrl (Figure 2B).
Figure 2

Differentially expressed genes in endometriosis. (A) Volcano map of differentially expressed genes in EC compared with Ctrl. (B) Volcano map of differentially expressed genes in EU compared with Ctrl. EC: ectopic endometrium from patient with endometriosis. EU: eutopic endometria from patient with endometriosis. Ctrl: endometrium from patient without endometriosis.

Gene-set enrichment analysis

GSEA revealed the genes in the EC were mainly enriched in the immune response and immune cell trafficking, such as protein activation cascade, complement activation, regulation of humoral immune response, complement and coagulation cascades pathway and chemokine signaling pathway (Figure 3A-B), and the genes in the EU were mainly enriched in the stress response and steroid hormone biosynthesis, such as the stress response to copper ion, activation of GTP hydrolases (GTPases) activity, Human ATP-binding cassette (ABC) transporter dependent pathway and steroid hormone biosynthesis pathway (Figure 3C-D). We explored the functions of the two entities of endometriosis using genomic alteration profiles. These function annotations revealed the distinct roles of the EC and EU in the pathological process of endometriosis, which demonstrated the reliability of our results.
Figure 3

Gene set enrichment analysis (GSEA) of genes in endometriosis. GSEA-identified biological processes (A) and pathways (B) with significant enrichment in EC compared with Ctrl. GSEA-identified biological processes (C) and pathways (D) with significant enrichment in EU compared with Ctrl.

Construction of protein-protein interaction network

To identify the candidate hub genes, the most differentially expressed 362 genes in the EC versus Ctrl and 1992 genes in the EU versus Ctrl were selected for the PPI network construction. The MCODE clustering algorithm was applied to analyze the PPI network. With a threshold of k-scores>3, seven clusters with 78 candidate hub genes in the EC and 21 clusters with 205 candidate hub genes in the EU were selected. Figure 4A-D depicts the top two clusters in the EC and EU.
Figure 4

Protein-protein interaction network cluster analysis. The PPI network of differentially expressed genes in EC versus Ctrl or EU versus Ctrl was constructed using the STRING website. The MCODE clustering algorithm was applied to the network to identify the potential hub genes. Top two clusters in each group are shown in the figure. In the EC-associated PPI network, cluster 1 consists of 11 nodes and 31 edges (A) and cluster 2 consists of 11 nodes and 31 edges (B). In the EU-associated PPI network, cluster 1 consists of 21 nodes and 209 edges (C) and cluster 2 consists of 11 nodes and 55 edges (D). PPI: protein-protein interaction; STRING, search tool for retrieval of interacting genes/proteins.

Construction of the weighted gene co-expression network and identification of clinically significant modules

The values of β = 14 (scale free R2 = 0.80) and β = 22 (scale free R2 = 0.85) were selected as the soft-threshold powers to ensure scale-free networks using R package of “WGCNA” (Figure 5A-B). Genes with similar expression patterns were clustered into co-expression modules that were displayed in different colors. A total of 58 and 39 modules were identified (Figure 5C-D).
Figure 5

Weighted gene co-expression network analysis of genes in endometriosis. Analysis of the scale-free topology fit index and the mean connectivity for various soft-threshold powers (β) for the genes in the EC (A) and EU (B). Dendrogram of all expressed genes in the EC (C) and EU (D) clustered based on a dissimilarity measure (1‐TOM). Determination of module-trait relationship of the EC (E) or EU (F) group in endometriosis and identification of the most clinically relevant modules; each row indicates a module eigengene (the first principal component of the gene expression matrix in a module), and each column represents a clinical trait. TOM: topological overlap matrix.

The relevance between each module and clinical information was shown in the module-trait relationship (Figure 5E-F). In this situation, we focused on the EC and EU-associated key modules. The blue module, containing 2036 genes, was most correlated with the EC (R=0.87, p=5×10-6). Meanwhile, the magenta module, containing 768 genes, was most correlated with the EU (R=0.97, p=1×10-12). Hence, the blue and magenta modules were clinically significant and used for the following analyses in this study.

Function analysis of the most significant module

To explore the function mechanism of genes in the clinically significant modules, GO analysis and KEGG analysis were conducted. Function analysis revealed that the main biological processes and pathways of the blue module were regulation of cell adhesion, autophagy, FoxO signaling pathway and focal adhesion pathway (Figure 6A-B), and those of the magenta module were regulation of the mitogen-activated protein kinase (MAPK) cascade, regulation of the growth hormone receptor, the tumor necrosis factor (TNF) signaling pathway and NOD-like receptor signaling pathway (Figure 6C-D). These function annotations for the blue and magenta modules are listed in Table 1 and Table 2. The genes in the top EC-associated module mainly played roles in autophagy, focal adhesion and cancer, while those in the top EU-associated module were involved in creating an estrogen-rich and inflammatory microenvironment.
Figure 6

Function annotations of clinically significant module. The GO biological processes (A) and KEGG pathways (B) of genes in EC-associated blue module. The GO biological process (C) and KEGG pathway (D) of genes in EU-associated magenta module. GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes.

Table 1

The GO and KEGG Pathway analysis of genes in the blue module

CategoryTermDescriptionLog(p-value)
GO Biological ProcessesGO:0030029actin filament-based process-20.593297
GO Biological ProcessesGO:0071417cellular response to organonitrogen compound-20.2406423
GO Biological ProcessesGO:0030155regulation of cell adhesion-20.0365408
GO Biological ProcessesGO:0006914autophagy-15.9477686
GO Biological ProcessesGO:0010942positive regulation of cell death-14.3658968
GO Biological ProcessesGO:0007264small GTPase mediated signal transduction-14.0979829
GO Biological ProcessesGO:0070848response to growth factor-13.3811234
GO Biological ProcessesGO:0045055regulated exocytosis-13.372418
GO Biological ProcessesGO:0051345positive regulation of hydrolase activity-13.0915765
GO Biological ProcessesGO:0001568blood vessel development-12.7380519
GO Biological ProcessesGO:0000904cell morphogenesis involved in differentiation-12.5393524
GO Biological ProcessesGO:0043062extracellular structure organization-12.4268848
GO Biological ProcessesGO:0009611response to wounding-12.0333641
GO Biological ProcessesGO:0043408regulation of MAPK cascade-11.5789888
GO Biological ProcessesGO:0022604regulation of cell morphogenesis-11.4691072
GO Biological ProcessesGO:0002576platelet degranulation-11.2263844
GO Biological ProcessesGO:0019221cytokine-mediated signaling pathway-10.9869128
GO Biological ProcessesGO:0045936negative regulation of phosphate metabolic process-10.9676618
GO Biological ProcessesGO:0003012muscle system process-10.2900458
GO Biological ProcessesGO:0051129negative regulation of cellular component organization-10.2118548
KEGG Pathwayhsa04068FoxO signaling pathway-9.5
KEGG Pathwayhsa04510Focal adhesion-9
KEGG Pathwayhsa04014Ras signaling pathway-8.6
KEGG Pathwayhsa04010MAPK signaling pathway-8.2
KEGG Pathwayhsa05200Pathways in cancer-7.3
KEGG Pathwayhsa05205Proteoglycans in cancer-7.2
KEGG Pathwayhsa04380Osteoclast differentiation-6.8
KEGG Pathwayhsa04610Complement and coagulation cascades-6.6
KEGG Pathwayhsa04630Jak-STAT signaling pathway-6
KEGG Pathwayhsa05418Fluid shear stress and atherosclerosis-5.9
KEGG Pathwayhsa04270Vascular smooth muscle contraction-5.9
KEGG Pathwayhsa04144Endocytosis-5.9
KEGG Pathwayhsa04727GABAergic synapse-5.8
KEGG Pathwayhsa04137Mitophagy - animal-5.5
KEGG Pathwayhsa04915Estrogen signaling pathway-5.5
KEGG Pathwayhsa04621NOD-like receptor signaling pathway-5.1
KEGG Pathwayhsa05202Transcriptional misregulation in cancer-4.2
KEGG Pathwayhsa04720Long-term potentiation-4.1
KEGG Pathwayhsa04668TNF signaling pathway-3.8
KEGG Pathwayhsa04020Calcium signaling pathway-3.7

Note: GO Biological Processes: Gene ontology analysis of biological process. KEGG: Kyoto Encyclopedia of Genes.

Table 2

The GO and KEGG Pathway analysis of genes in the magenta module

CategoryTermDescriptionLog(p-value)
GO Biological ProcessesGO:0043408regulation of MAPK cascade-6.05131099
GO Biological ProcessesGO:0043409negative regulation of MAPK cascade-4.84419894
GO Biological ProcessesGO:1903351cellular response to dopamine-4.25433626
GO Biological ProcessesGO:0035666TRIF-dependent toll-like receptor signaling pathway-3.72907579
GO Biological ProcessesGO:0060398regulation of growth hormone receptor signaling pathway-3.59181535
GO Biological ProcessesGO:1904407positive regulation of nitric oxide metabolic process-3.51272839
GO Biological ProcessesGO:0009190cyclic nucleotide biosynthetic process-3.38699864
GO Biological ProcessesGO:0000226microtubule cytoskeleton organization-3.37279545
GO Biological ProcessesGO:0030258lipid modification-3.34130208
GO Biological ProcessesGO:0016569covalent chromatin modification-3.19790154
GO Biological ProcessesGO:0046856phosphatidylinositol dephosphorylation-3.11446184
GO Biological ProcessesGO:0046777protein autophosphorylation-3.11274876
GO Biological ProcessesGO:0090336positive regulation of brown fat cell differentiation-3.07693529
GO Biological ProcessesGO:0060391positive regulation of SMAD protein signal transduction-3.07693529
GO Biological ProcessesGO:0007099centriole replication-3.06412444
GO Biological ProcessesGO:0031570DNA integrity checkpoint-2.93592311
GO Biological ProcessesGO:0019835cytolysis-2.87954652
GO Biological ProcessesGO:0007030Golgi organization-2.79292724
GO Biological ProcessesGO:0048539bone marrow development-2.684576
KEGG Pathwayhsa00310Lysine degradation-5
KEGG Pathwayhsa04668TNF signaling pathway-3.4
KEGG Pathwayhsa04621NOD-like receptor signaling pathway-3.2
KEGG Pathwayhsa05020Prion diseases-2.4
KEGG Pathwayhsa00514Other types of O-glycan biosynthesis-2.4
KEGG Pathwayhsa04140Autophagy - animal-2.3
KEGG Pathwayhsa04912GnRH signaling pathway-2.2
KEGG Pathwayhsa04070Phosphatidylinositol signaling system-2.1
KEGG Pathwayhsa04210Apoptosis-2.1
KEGG Pathwayhsa04550Signaling pathways regulating pluripotency of stem cells-2

Identification of hub genes

Hub genes in the co-expression network are characterized by high intramodular connectivity which is measured by the value of GS and MM. In Figure 7A and 7B, the scatterplots of GS (y-axis) vs. MM (x-axis) are shown in the blue (R=0.95, p<1×10-200) and magenta (R=0.8, p<4.2×10-172) modules. MM was highly correlated with GS in each module, which indicated that the hub genes in the co-expression modules were highly correlated with endometriosis. With the threshold of |MM| > 0.6 and |GS| > 0.8, using WGCNA, we identified 735 candidate hub genes in the blue module, and 329 candidate hub genes in the magenta module.
Figure 7

Identification of hub genes in endometriosis. (A) Scatterplot of module eigengenes related to EC in the blue co-expression module. (B) Scatterplot of module eigengenes related to EU in the magenta co-expression module. (C) Venn plot of common hub genes in blue module and PPI network. (D) Venn plot of common hub genes in magenta module and PPI network.

For the identification of endometriosis-associated hub genes, we compared the hub genes in the co-expression and PPI networks. We finally identified 16 overlapping hub genes in the blue module (Figure 7C) and 12 overlapping hub genes in the magenta module (Figure 7D). These 28 hub genes are listed in Table 3.
Table 3

The common hub genes of WGCNA and PPI analysis in the blue and magenta modules

Gene SymbolGene DescriptionModuleWGCNAPPI AnalysisLimma Analysis
GSGS P valueMMMM, P valueK ScoreLog2FCFDRUp or Down
ACTG2actin, gamma 2, smooth muscle, entericblue0.9221396467.51E-090.9233661766.54E-095.6673.5481584591.02E-08up
C1Rcomplement C1r subcomponentblue0.8979847337.81E-080.961741511.45E-1142.2085522921.14E-07up
CDH3cadherin 3blue0.9314209362.48E-090.9718366429.56E-136.22.7753389293.97E-09up
CLUclusterinblue0.8635758429.38E-070.8783109343.54E-0742.0446197421.29E-06up
COL8A1collagen type VIII alpha 1 chainblue0.921931417.69E-090.9411729736.47E-1034.3298811431.01E-08Up
GATA6GATA binding protein 6blue0.987634536.14E-160.9727361967.16E-135.6674.8781946981.42E-15up
HOXC6homeobox C6blue0.9443173183.99E-100.908751762.98E-084.83.4874991376.36E-10up
LMOD1leiomodin 1blue0.8801268013.12E-070.9477998492.26E-105.6672.4760356974.14E-07up
MYH11myosin heavy chain 11blue0.9330728382.00E-090.9757519922.52E-135.6673.7083764862.96E-09Up
MYLKmyosin light chain kinaseblue0.9149546461.62E-080.947235912.48E-105.6672.1253678212.54E-08up
MYOCDmyocardinblue0.8599958611.17E-060.9297125453.08E-095.6673.4410411691.43E-06up
PROS1protein S (alpha)blue0.9346073991.64E-090.9842753385.27E-1542.8992542772.67E-09up
SERPING1serpin family G member 1blue0.8590671141.23E-060.9373908491.12E-0942.4247914291.61E-06up
TAGLNtransgelinblue0.9090618492.89E-080.9599031562.20E-115.6672.8156764313.97E-08up
THBS1thrombospondin 1blue0.7644803238.68E-050.844902842.77E-0642.0475750430.000118406up
THBS2thrombospondin 2blue0.8701015016.18E-070.921289718.25E-0932.3971113238.17E-07up
ABCC8ATP binding cassette subfamily C member 8magenta0.7260501150.0009669950.9226240531.35214E-0732.3681062990.000000474Up
COL4A6collagen type IV alpha 6 chainmagenta0.8313032643.54879E-050.8737360484.59798E-064.713.5920492925.82E-08up
COL5A3collagen type V alpha 3 chainmagenta0.8647221397.50448E-060.964963144.02139E-104.712.6908129096.65E-08up
CXCL13C-X-C motif chemokine ligand 13magenta0.6760385942.89E-030.8087871348.47082E-0520.92.6924218450.00000669up
CYP2E1cytochrome P450 family 2 subfamily E member 1magenta0.8133935157.1566E-050.9374725782.85784E-084.52.32193054.78E-08up
CYP4B1cytochrome P450 family 4 subfamily B member 1magenta0.7361119430.0007543640.8583206271.04106E-054.53.07523330.000000839up
DKK1dickkopf WNT signaling pathway inhibitor 1magenta0.6970584890.0018718540.8158609256.53E-054.713.5332659930.000000546up
FSTL3follistatin like 3magenta0.763828630.0003585090.8948317241.24E-065.0672.9979771440.000000244up
KLF4Kruppel like factor 4magenta0.7252610090.0009855740.8843685962.45E-064.713.2238385819.88E-08up
NR4A2nuclear receptor subfamily 4 group A member 2magenta0.8001017770.0001150470.8646860867.52E-064.713.448669344.88E-08up
PROK1prokineticin 1magenta0.8307068443.6373E-050.8079776498.72E-05114.0820226381.93E-08up
WDR27WD repeat domain 27magenta0.7904406330.0001590370.9506698945.02E-0932.2324483429.99E-08up

WGCNA: weight gene co-expression network analysis, PPI: protein-protein interaction, GS: gene significance, MM: module membership, log2FC: log2 (Fold-Change) values of differentially expressed genes.

Validation of hub genes

Independent datasets were used to identify the hub genes. We compared the expression of each hub gene in endometriosis. Fifteen hub genes were differentially expressed between the EC and Ctrl in GSE 7305 (Figure 8A), and seven hub genes were differentially expressed between the EU and Ctrl in GSE 51981 (Figure 8B). Boxplots were used to show the validation results (Figure 8A-B).
Figure 8

Validation of hub genes in the independent GEO datasets. (A) Boxplot shows the hub gene expression in EC and Ctrl. (B) Boxplot shows the hub gene expression in EU and Ctrl (*p< 0.05 versus Ctrl).

Discussion

Endometriosis is a non-malignant gynecological disease whose pathogenesis is still unclear. The absence of biomarkers may contribute to the long delay between disease onset and diagnosis. Hence, it is imperative to identify novel molecular biomarkers that may enable early diagnosis and personalized treatment. For the first time, our study identified endometriosis-associated hub genes using WGCNA, which may hold important clues regarding the pathogenesis of endometriosis, provide valuable resources for the identification of endometriosis biomarkers and thus may improve the clinical management of this disease. WGCNA can produce more robust results compared with other bioinformatics methods 17,18 because it constructs weighted co-expression networks based on the similarities of gene expression profiles and focuses on the correlation between the co-expressed modules and clinical traits. Hub genes are defined as the highly connected nodes that contribute to a phenotype or disease 19. Therefore, this method has been used to identify biologically relevant modules and biomarkers in different diseases 20. Endometriosis is a benign disease, although, similar to cancer, it has characteristics of being invasive and migratory. In our study, EC and EU-associated hub modules were identified. Function enrichment analyses showed that the genes in the blue and magenta modules had different roles and both were significantly associated with endometriosis, which demonstrated our analysis. For example, genes in the EC-associated blue module played roles in autophagy, focal adhesion (the initiation step for disease progression 21) and cancer, all of which were involved in the pathogenesis of endometriosis 22,23. Previous studies showed that S100A7 promoted the development of endometriosis by activating NF-kappaB signaling pathway 24. In our study, the genes in EU-associated magenta module played roles in the regulation of growth hormone receptor signaling pathway, NF-kappaB signaling and GnRH signaling pathway, which induced an estrogen-rich and inflammatory microenvironment involved in cell division, cell movement and survival in endometriosis 20,25-27. As a result, we assume that hormone receptor signaling or inflammatory microenvironment may promote the passing of EU through oviducts and migrating to the ovarian surfaces, and adhesion and autophagy correlated genes may induce the successful ectopic implantation of endometrium (EC) and formation of endometriotic lesions. Taken together, dysregulated genes in the EU may be responsible for the increased propensity of endometrial debris ectopic implantation and for early events that lead to the establishment of lesions. Dysregulated genes in the EC may contribute to the lesion formation and influence the progression of the disease. To better understand the pathogenesis of endometriosis, WGCNA and PPI analyses were used to identify the EC and EU-associated hub genes. Some hub mRNAs, such as TAS2R3, TAS2R41, SERPING1, CASR, CCKAR, GPR55, HCRTR2, CRH, HTR5A, CFTR, and ENAM, were also key enriched genes in the GSEA. For instance, SERPING1 was involved in the complement and coagulation cascades, both NR4A2 and ABCC8 played important roles in ABC transporters, and CYP2E1 was involved in the pathway of steroid hormone biosynthesis. In addition, some identified hub genes of the EC (TAGLN, GATA6, CDH3, CLU, COL8A1, MYH11, MYOCD) and EU (CXCL13, DDK-1, KLF4, CYP2E1, CYP4B1 and PROK1) have been reported be associated with endometriosis. For example, TAGLN may be involved in cell invasion, migration, and differentiation in endometriosis 28. GATA6 is an essential gene in the activation of estrogen synthesis and may become a molecular marker in endometriotic lesions 29,30. Endometrial CXCL13 expression may play an important role in the pathophysiology of endometriosis 31. MiR-200b inhibited invasive growth in endometriosis by targeting KLF4 32. Dysregulated endometrial PROK1 expression may be correlated with the progesterone resistance of endometriosis 33. Most importantly, we discovered some novel and important genes, including HOXC6, PROS1, SERPING1, MYLK, ACTG2 and THBS2 in the ectopic endometrium, and NR4A2, ABCC8, COL4A6, COL5A3, FSTL3 and WDR27 in the eutopic endometrium. For example, HOXC6 was found to regulate the response to hormonal signals, and the overexpression of FSTL3 significantly improved angiogenesis and neovascularization in the induced pluripotent stem cells 34. These hub genes may provide new mechanisms for endometriosis and will be investigated in the future. Our study identified multi-molecule biomarkers in endometriosis. However, some patients of the validation datasets had incomplete clinical information, which affected further data exploration. The identified genes will be further validated by clinical specimens and in vitro experiments for their application in endometriosis.

Conclusion

Our study for the first time analyzed the gene expression files of the eutopic and ectopic endometrium in women with endometriosis using WGCNA, explored the distinct functions of the eutopic and ectopic endometrium, and identified co-expression modules and potential biomarkers for endometriosis. Our study may improve the understanding of the pathogenesis of endometriosis and provide references for endometriosis-associated biomarkers and therapeutic targets.
  32 in total

Review 1.  Nuclear factor-kappaB: a main regulator of inflammation and cell survival in endometriosis pathophysiology.

Authors:  Reinaldo González-Ramos; Sylvie Defrère; Luigi Devoto
Journal:  Fertil Steril       Date:  2012-07-06       Impact factor: 7.329

2.  A general framework for weighted gene co-expression network analysis.

Authors:  Bin Zhang; Steve Horvath
Journal:  Stat Appl Genet Mol Biol       Date:  2005-08-12

3.  Network neighborhood analysis with the multi-node topological overlap measure.

Authors:  Ai Li; Steve Horvath
Journal:  Bioinformatics       Date:  2006-11-16       Impact factor: 6.937

4.  TAGLN expression is deregulated in endometriosis and may be involved in cell invasion, migration, and differentiation.

Authors:  Gabriela Dos Santos Hidalgo; Juliana Meola; Júlio César Rosa E Silva; Cláudia Cristina Paro de Paz; Rui Alberto Ferriani
Journal:  Fertil Steril       Date:  2011-07-20       Impact factor: 7.329

5.  GATA6 expression promoted by an active enhancer may become a molecular marker in endometriosis lesions.

Authors:  Masao Izawa; Fuminori Taniguchi; Tasuku Harada
Journal:  Am J Reprod Immunol       Date:  2019-01-14       Impact factor: 3.886

6.  NLRC5 and autophagy combined as possible predictors in patients with endometriosis.

Authors:  Lei Zhan; Shun Yao; Shiying Sun; Qian Su; Jun Li; Bing Wei
Journal:  Fertil Steril       Date:  2018-10       Impact factor: 7.329

7.  Soluble VCAM-1/soluble ICAM-1 ratio is a promising biomarker for diagnosing endometriosis.

Authors:  L Kuessel; R Wenzl; K Proestling; S Balendran; P Pateisky; G Yerlikaya; B Streubel; H Husslein
Journal:  Hum Reprod       Date:  2017-04-01       Impact factor: 6.918

8.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

9.  WGCNA: an R package for weighted correlation network analysis.

Authors:  Peter Langfelder; Steve Horvath
Journal:  BMC Bioinformatics       Date:  2008-12-29       Impact factor: 3.169

10.  Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design.

Authors:  Jianqiang Li; Doudou Zhou; Weiliang Qiu; Yuliang Shi; Ji-Jiang Yang; Shi Chen; Qing Wang; Hui Pan
Journal:  Sci Rep       Date:  2018-01-12       Impact factor: 4.379

View more
  1 in total

1.  Integrated bioinformatics analysis uncovers characteristic genes and molecular subtyping system for endometriosis.

Authors:  Zhaowei Wang; Jia Liu; Miaoli Li; Lishan Lian; Xiaojie Cui; Tai-Wei Ng; Maoshu Zhu
Journal:  Front Pharmacol       Date:  2022-08-17       Impact factor: 5.988

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.