Sha Lv1, Xiaoxiao Xu1, Zhangying Wu1. 1. Department of Gynecology and Obstetrics, The Affiliated Hospital of Guizhou Medical University, Guiyang, Guizhou 550001, P.R. China.
Endometrial cancer (EC) is the most common gynecologic malignancy in the western world and the fourth most common cancer in women worldwide, with >280,000 cases per year worldwide in 2017 (1). The number of estimated deaths caused by EC in 2016 was 10,470, which was 1.8% of all cancer deaths; the five-year survival rate was 81.7% (2). Two histologic categories have been described among adenocarcinomas of the endometrium: Type 1 and type 2. Type 1 adenocarcinomas are estrogen-mediated, have an endometrioid histology and are mostly lower grade. They account for 70–80% of new cases. Type 2 tumors occur more frequently in leaner, older women, and this type consists of higher-grade tumors and nonendometrioid histologies (usually serous or clear cells) (3,4). Most patients who are diagnosed at an early stage have a relatively better prognosis compared with those who are diagnosed at an advanced stage or with recurrent tumor (5). Thus, further investigation was conducted aiming to reveal the possible mechanisms in the occurrence and development of EC at the molecular level, to explore potential candidate biomarkers as targets for more accurate and early diagnosis, and treatment, in order to promote the overall survival rate and prognosis of EC. The biological processes of EC were explored; however the gene interactions and biological pathways of EC were not accurately verified. In recent decades, with the rapid development and wide application of microarray technology and bioinformatics analysis, studies of diseases have advanced to the genetic level. Increasing evidence has shown that the abnormal expression and mutation of genes, including p53, K-ras, PTEN (6,7), and mismatch repair (MMR) genes (8,9), were associated with the carcinogenesis and progression of EC. Thus, certain genes have the potential to become biomarkers of EC. The identification of the differentially expressed genes (DEGs) and pathways involved in EC can be achieved by using bioinformatics methods. In the present study, three mRNA microarray datasets were downloaded from the Gene Expression Omnibus database (GEO) to avoid false-positive rates in any single dataset. A total of 118 DEGs between EC and noncancerous tissues were screened from the datasets and 11 hub genes were selected as candidate biomarkers for the diagnosis, treatment and prognosis of EC.
Materials and methods
Data sources
The GEO (http://www.ncbi.nlm.nih.gov/geo) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics datasets (10). Three mRNA datasets [GSE63678 (GPL571 platform, Affymetrix Human Genome U133A 2.0 Array) (11), GSE17025 (GPL570 platform, Affymetrix Human Genome U133 Plus 2.0 Array) (12) and GSE3013 (GPL8300 platform, Affymetrix Human Genome U95 Version 2 Array) (13)] were downloaded from GEO. The GSE63678 dataset comprised 18 patients with gynecological cancer and 17 women as the control group. The clinicopathological data are listed in Table SI. Seven samples of EC and five samples of normal tissue were selected for the study. GSE17025 contained 91 samples of EC (79 endometrioid cancer and 12 serous cancer) with a heterogeneous distribution of grade and depth of myometrial invasion (i.e., 9 IAG1, 14 IAG2, 7 IAG3, 14 IBG1, 12 IBG2, 13 IBG3, 7 ICG1, 10 ICG2, and 6 ICG3) and 12 age-matched normal endometrial samples from post-menopausal women as control. GSE3013 contained one sample of endometrial epithelial cells (EECs) from stage I endometrioid carcinomas, one sample of EECs from stage I endometrioid carcinomas treated with oestrogen (E2), one sample of EECs from stage I endometrioid carcinomas treated with tamoxifen (TAM), one sample of EECs from stage II endometrioid carcinomas, two samples of EECs from stage II endometrioid carcinomas treated with E2, one sample of EECs from stage II endometrioid carcinomas treated with TAM, and samples of normal endometrial epithelium in each group (six samples total) as control. Two samples of EC without any treatment and two samples of normal endometrial epithelium were selected for the study.
Identification of DEGs
The DEGs were screened by using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r). Thus interactive web utility can be used to compare the datasets in a GEO series and identify DEGs across experimental conditions. Probe sets without corresponding gene symbols or with >1 gene symbol were removed. Genes with >1 probe set were maximized. The cut-off criteria were set as follows: Adj. P-value, <0.05 and LogFC (fold change), >1.
Functional annotation and pathway enrichment
The purpose of this step is to explain gene function and find relevant pathways. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene ontology (GO) enrichment analyses were accomplished in the Database for Annotation, Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/; version 6.8). DAVID is a bioinformatics data resource consisting of an integrated biological knowledge base and analytic tools aimed at systematically extracting the biological significance of genes and proteins from large lists. It provides a comprehensive set of functional annotation tools for investigating the biological mechanisms underlying a list of genes (14). KEGG (https://www.kegg.jp/) is a database resource for understanding high-level functions and biological systems from large-scale molecular datasets generated by high-throughput experimental technologies (15). The GO knowledge base (http://geneontology.org/) is the world's largest source of information on the functions of genes. Three independent ontologies, including the biological process (BP), molecular function (MF) and cellular component (CC) categories were constructed to describe gene product attributes (16,17). P<0.05 was considered to indicate a statistically significant difference.
Protein-protein interaction (PPI) network construction and module analysis
The DEGs were mapped using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; http://string-db.org; version 10.0) online database, which is a database designed to provide a critical assessment and integration of PPI (18). An interaction with a combined score >0.4 was considered statistically significant. Cytoscape 3.7.1 (19) was used to visualize the PPI network. The most significant module was identified using the Molecular Complex Detection (MCODE) plug-in (20). The selection criteria were: MCODE scores >5; degree cut-off=2; node score cut-off=0.2; max depth=100; and k-score=2. GO term and KEGG pathway enrichment were assessed for the functional analysis of the 11 hub genes with degrees ≥45 in the significant modules.
Hub gene analysis
A network of the 11 hub genes and their coexpression genes was analyzed by the cBioPortal (http://www.cbioportal.Org; version 3.1.0) online platform (21). The biological process analysis was performed using the Biological Networks Gene Ontology tool (BiNGO) plugin of Cytoscape (version 3.0.3) (22). Hierarchical clustering of the hub genes was visualized by the University of California Santa Cruz (UCSC) Xena Functional Genomics Explorer (https://xenabrowser.net/) (23), which showed the differential expression of the hub genes between EC and normal tissue. The overall survival (OS) rate and disease-free survival (DFS) of mRNA expression was assessed using Kaplan-Meier curves in the cBioPortal online platform. The expression profiles of cyclin B1 (CCNB1), ubiquitin conjugating enzyme E2 C (UBE2C) and cell division cycle 20 (CDC20) were analyzed and displayed using Gene Expression Profiling Interactive Analysis (GEPIA; http://gepia.cancer-pku.cn/index.html) (24). The association between expression patterns and tumor grades were analyzed using the online database UALCAN (http://ualcan.path.uab.edu/index.html) (25). These analyses were all based on data from The Cancer Genome Atlas (TCGA) (26).
Results
Identification of DEGs in EC
Based on the cut-off criteria of adj. P-value <0.05 and logFC>1, DEGs (455 in GSE63678; 3600 in GSE17025; and 4740 in GSE3013) were identified in the EC samples. There were 118 genes that were differentially expressed among the three datasets (Fig. 1), consisting of 27 downregulated genes and 91 upregulated genes.
Figure 1.
Venn diagram of DEGs. DEGs were selected with a fold change >1 and adjusted P-value <0.05 among the mRNA expression profiling sets GSE63678, GSE17025 and GSE3013. The three datasets showed an overlap of 118 genes. DEGs, differentially expressed genes.
The biological classification analysis of 118 DEGs was performed using DAVID, including functional and pathway enrichment analyses. Sorting by P-value, the top five GO terms of the BP, MF and CC categories are shown in Fig. 2. The downregulated genes were mainly involved in the positive regulation of transcription from RNA polymerase II promoter in the BP category, in protein binding in the MF category, and mainly constituted the nucleus in the CC category. The upregulated genes were mainly associated with cell division in the BP category, protein binding in the MF category, and mainly constituted the nucleus in the CC category. The KEGG pathway analysis indicated that the downregulated DEGs were primarily enriched in pathways associated with cancer, and the upregulated DEGs were mainly enriched in the cell cycle (Fig. 3).
Figure 2.
The GO terms of the BP, CC and MF categories enrichment of the 118 differentially expressed genes. (A) Upregulated gene enrichment in GO. (B) Downregulated gene enrichment in GO. GO, Gene Ontology; BP, biological process; CC, cellular component; MF, molecular function.
Figure 3.
The KEGG pathway analysis of the 118 differentially expressed genes. The KEGG pathway analysis of (A) upregulated genes and (B) downregulated genes. KEGG, Kyoto Encyclopedia of Genes and Genomes.
PPI network construction and module analysis
Following the prediction by STRING, the PPI network of DEGs was constructed by using Cytoscape (Fig. 4A), which resulted in 98 nodes and 1078 edges. The most significant module including 41 nodes total was obtained using MCODE (Fig. 4A), in which 11 nodes of them with a degree of ≥45, were regarded as hub genes (Fig. 4B). The functional analyses using DAVID showed that the hub genes were mainly enriched in the cell cycle, oocyte meiosis, progesterone-mediated oocyte maturation, the p53 signaling pathway and viral carcinogenesis (Table I).
Figure 4.
PPI network of the 118 DEGs. (A) The PPI network of the 118 DEGs, the most significant module was shown in red. (B) The PPI network of the 11 hub genes (degree ≥45). DEGs, differentially expressed genes; PPI, protein-protein interaction.
Table I.
GO and KEGG pathway enrichment analysis of hub genes.
Category
Term
Count
P-value
GOTERM_BP
GO:0051301-cell division
10
6.57×1015
GOTERM_BP
GO:0007067-mitotic nuclear division
8
1.63×10−11
GOTERM_BP
GO:0031145-anaphase-promoting complex-dependent catabolic process
5
9.33×10−8
GOTERM_BP
GO: 0051439-regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle
4
2.68×10−7
GOTERM_BP
GO:0042787-protein ubiquitination involved in ubiquitin-dependent protein catabolic process
5
1.33×10−6
GOTERM_CC
GO:0005829-cytosol
10
1.81×10−6
GOTERM_CC
GO:0005654-nucleoplasm
9
9.89×10−6
GOTERM_CC
GO:0000922-spindle pole
4
2.42×10−5
GOTERM_CC
GO:0005813-centrosome
5
5.53×10−4
GOTERM_CC
GO:0005634-nucleus
9
1.48×10−3
GOTERM_MF
GO: 0004693-cyclin-dependent protein serine/threonine kinase activity
3
1.75×10−4
GOTERM_MF
GO:0005524-ATP binding
6
9.33×10−4
GOTERM_MF
GO:0019901-protein kinase binding
4
1.17×10−3
GOTERM_MF
GO:0035173-histone kinase activity
2
2.36×10−3
GOTERM_MF
GO:0005515-protein binding
10
1.49×10−2
KEGG_PATHWAY
hsa04110:Cell cycle
6
9.42×10−8
KEGG_PATHWAY
hsa04114:Oocyte meiosis
5
4.28×10−6
KEGG_PATHWAY
hsa04115:p53 signaling pathway
4
4.78×10−5
KEGG_PATHWAY
hsa04914:Progesterone-mediated oocyte maturation
4
1.05×10−4
KEGG_PATHWAY
hsa05203:Viral carcinogenesis
3
2.20×10−2
GO, Gene Ontology; BP, biological process; CC, cellular component; MF, molecular function; KEGG, Kyoto Encyclopedia of Genes and Genomes. The count was the number of enriched genes in each term. If there were >5 terms enriched in this category, the top five terms were selected according to their P-value.
In the most significant module obtained using MCODE, a total of 11 hub genes were identified with degrees ≥45 (Table II). The cBioPortal online platform was used to analyze and to draw a network of the hub genes and their coexpression genes (Fig. 5). The biological process analysis of the hub genes is shown in Fig. 6. Hierarchical clustering showed that the hub genes could basically differentiate the EC samples from the noncancerous samples, as is evident in Fig. 7. The OS and DFS analysis of the hub genes was performed using Kaplan-Meier curves. Using the data from cBioPortal, ECpatients with hub gene alterations showed worse overall survival and disease-free survival (P<0.05; Fig. 8A and B). Among these genes, cases with a CCNB1 alteration showed worse overall survival (P<0.05; Fig. 8A), and those without UBE2C and CDC20 alterations showed better disease-free survival (P<0.05; Fig. 8B). The survival curves of cases with alterations in CDC20 and UBE2C showed that OS was also decreased (0.05
CCNB1
in the DFS curve. These genes were all upregulated in EC tissues in the three datasets from GEO and were considered to take part in the carcinogenesis or progression of EC. The expression profiles of CCNB1, UBE2C and CDC20 in human tissue were displayed using GEPIA (Fig. 9). CCNB1 mRNA displayed higher levels in tumors of the brain, lymphonodus, lung, colon, uterus and cervix-uteri compared with the matched normal tissues. UBE2C mRNA displayed higher levels in tumors of the brain, lymphonodus, lung, breast, stomach, colon, ovary, uterus, cervix uteri, bladder and testis compared with normal tissues. Furthermore, CDC20 displayed higher levels in the brain, lymphonodus, thymus, lung, colon, ovary, uterus, cervix uteri and bladder compared with the matched normal tissues. The analysis of tumor vs. normal tissues by UALCAN demonstrated that CCNB1, UBE2C and CDC20 had significantly increased expression in EC in the different datasets (Fig. 10). All had increased expression in serous carcinoma compared with endometrioid carcinoma. In addition, the three genes, particularly UBE2C, showed a tendency toward higher expression in the late stage. An association between the three genes and body weight was identified, and patients with EC and normal weight had a higher expression than those with extreme obese weight.
Table II.
Key nodes in the protein-protein interaction network with a degree ≥45.
No.
Name
Degree
Gene title
Function
1
AURKA
50
Aurora kinase A
The protein encoded by AURKA is a cell cycle-regulated kinase that appears to be involved in microtubule formation and/or stabilization at the spindle pole during chromosome segregation.
2
CCNB1
50
Cyclin B1
The protein encoded by CCNB1 is a regulatory protein involved in mitosis, which is necessary for regulation of the G2/M transition phase of the cell cycle.
3
CDK1
50
Cyclin dependent kinase 1
The protein encoded by CDK1 is a member of the Ser/Thr protein kinase family, which is essential for G1/S and G2/M phase transitions of eukaryotic cell cycle.
4
CCNA2
48
Cyclin A2
The protein encoded by CCNA2 binds and activates cyclin-dependent kinase 2 and thus promotes transition through G1/S and G2/M.
5
CCNB2
47
Cyclin B2
Cyclin B2 is a member of the B-type cyclins. The B-type cyclins, B1 and B2, associate with p34cdc2 and are essential components of the cell cycle regulatory machinery. Cyclin B2 is primarily associated with the Golgi region.
6
KIF2C
46
Kinesin family member 2C
KIF2C encodes a kinesin-like protein that functions as a microtubule-dependent molecular motor. The encoded protein can depolymerize micro tubules at the plus end, thereby promoting mitotic chromosome segregation.
7
UBE2C
46
Ubiquitin conjugating enzyme E2 C
UBE2C encodes a member of the E2 ubiquitin-conjugating enzyme family. The encoded protein is required for the destruction of mitotic cyclins and for cell cycle progression, and may be involved in cancer progression.
8
RRM2
45
Ribonucleotide reductase regulatory subunit M2
RRM2 encodes one of two non-identical subunits for ribonucleotide reductase. Transcription from this gene can initiate from alternative promoters, which results in two isoforms that differ in the lengths of their N-termini.
9
CDC6
45
Cell division cycle 6
The protein encoded by CDC6 is highly similar to Saccharomyces cerevisiae Cdc6, a protein essential for the initiation of DNA replication.
10
KIF11
45
Kinesin family member 11
The function of KIF11 product includes chromosome positioning, centro some separation and establishing a bipolar spindle during cell mitosis.
11
CDC20
45
Cell division cycle 20
CDC20 appears to act as a regulatory protein interacting with several other proteins at multiple points in the cell cycle. It is required for nuclear movement prior to anaphase and chromosome separation.
Figure 5.
A network of the hub genes and their co-expression genes. Nodes with bold black outline represent hub genes. Nodes with thin black outline represent the co-expression genes.
Figure 6.
The biological process analysis of the hub genes. The color depth of nodes refers to the corrected P-value of ontologies. The size of nodes refers to the numbers of genes that are involved in the ontologies.
Figure 7.
Hierarchical clustering of hub genes was constructed by University of California Santa Cruz. The samples under the brown bar are non-cancerous samples and the samples under the blue bar are endometrial cancer samples. Upregulated genes are marked in red; downregulated genes are marked in blue.
Figure 8.
Overall survival and disease-free survival analyses of hub genes. Overall survival (A) and disease-free survival analyses (B) of hub genes were performed using cBioPortal online platform. P<0.05 was considered statistically significant.
Figure 9.
The expression profile of CCNB1, UBE2C and CDC20 in human tissues. Gene expression in tumor tissue (marked in red) and in normal tissue (marked in green). The darker the color the higher expression. CCNB1, cyclin B1; UBE2C, ubiquitin conjugating enzyme E2 C; CDC20, cell division cycle 20.
Figure 10.
Analysis of CCNB1, UBE2C and CDC20 expression in different tissues by UALCAN. CCNB1, cyclin B1; UBE2C, ubiquitin conjugating enzyme E2 C; CDC20, cell division cycle 20.
Discussion
EC is the fourth most common cancer in women worldwide in recent years, and the number of estimated deaths due to EC in 2016 was 10,470, which accounts for 1.8% of all cancer deaths (2). Unfortunately, the etiology of endometrial cancer remains poorly defined, hampering the developments of early diagnostic and effective therapeutic options for this disease. Using microarrays is a powerful technique to monitor the expression of thousands of genes in a single experiment. Key genes and pathways can be screened by analyzing microarray datasets from different experiments in GEO or other databases to identify the mechanisms in all biological processes (27,28). In the field of obstetrics and gynecology, microarrays have been used to study diseases such as ovarian cancer, cervical cancer and preeclampsia. The technology is applied to explore the occurrence and development of disease for better detection and treatment (29–31). Thus, the present study sought to apply microarray technology to explore the genetic alterations in EC.In the present study, three mRNA microarray datasets were downloaded from GEO and were analyzed to acquire DEGs between EC and noncancerous tissues. The selected 118 DEGs contained 27 downregulated genes and 91 upregulated genes. GO term enrichment analyses revealed that the downregulated genes were mainly enriched in the positive regulation of transcription from RNA polymerase II promoter and in protein binding, and mainly constituted the nucleus, whereas the upregulated DEGs were mainly enriched in cell division and protein binding, and mainly constituted the nucleus. Previous studies have reported that these processes, functions and cellular components play important roles in the tumorigenesis or progression of tumors (32–34). KEGG pathway analysis showed that the DEGs were primarily enriched in pathways in cancer and the cell cycle.A total of 11 hub genes were selected from the most significant module obtained using MCODE with degrees ≥45. Among these hub genes, CCNB1, UBE2C and CDC20 showed a correlation with the prognosis of ECpatients. Patients with an alternative expression of CCNB1, UBE2C and CDC20 were more concentrated in type 2 (elderly patients with normal weight, late stages and serous adenocarcinoma), which indicated that these genes are probably involved in the progression and poor prognosis of EC. The CCNB1 gene is indispensable for the control of the cell cycle at the G2/M (mitosis) transition. The gene product complexes with p34 (cdc2) to form the maturation-promoting factor (MPF; provided by RefSeq, Aug 2017), which may promote the progression to mitosis and augment the cellular growth rate. In various tumor types, the overexpression of CCNB1 was reported to be related to increased mitotic activity during malignant metastasis. CCNB1 was considered to be related to increased proliferative potential in EC (35). Wild-type p53 was reported to mediate the control of CCNB1 (36). As a target gene of the CCAAT-binding factor NF-Y, CCNB1 is upregulated by the complex of mutant p53 and NF-Y, which promotes cell cycle progression and might play a role in the chemoresistance of colorectal carcinoma by inducing DNA damage (37,38). Alterations of TP53 result in an aberrant form of P53 with a longer half-life that accumulates in the cell. The presence of TP53/P53 expression/accumulation and tp53 mutations was associated with an aggressive type of EC (39). Furthermore, a SNP (rs2069433) in the CCNB1 gene was associated with a reduction in EC risk, but the role of SNPs in the CCNB1 gene regarding the oncogenesis of EC remains to be further studied (40). UBE2C encodes a member of the E2 ubiquitin-conjugating enzyme family. It is a member of the anaphase promoting complex/cyclosome, which promotes the degradation of several target proteins during cell cycle progression, particularly during the metaphase/anaphase transition. UBE2C may present in several humanneoplasia. For instance, in esophageal squamous cell carcinoma, as a transcriptional target of FOXM1, UBE2C contributes to the loss of G2/M checkpoint control due to the deregulation of FOXM1 (41). The upregulation of UBE2C in several distinct tumor types has been associated with a highly malignant phenotype and poor survival, suggesting its role in cancer progression (42–44). The present study also showed that the expression of UBE2C was significantly higher in stage III than in stage I and II EC tissues. In malignant tissues including esophagus, colon and prostate, the 20q13.1 locus amplification is one of the important mechanisms underlying the aberrant expression of UBE2C (44–49). Wild-type p53 was reported to mediate the suppression of UBE2C expression, whereas mutant p53 acts in the opposite manner (46). Nevertheless, the concrete mechanisms of UBE2C in tumorigenesis and progression in EC should be further investigated. CDC20 is required for two microtubule-dependent processes, nuclear movement prior to anaphase and chromosome separation (provided by RefSeq, Jul 2008). CDC20 possibly acts as a regulatory protein interacting with several other proteins at multiple points in the cell cycle, and is necessary for the degradation of an S-phase cyclin, which can antagonize anaphase-promoting complex (APC) activity (50). The abnormal regulation of CDC20 may lead to the accumulation of deleterious chromosomal alterations promoting tumor development and progression. In breast cancer, the increased expression of CDC20 is associated with increased chromosomal instability (51). High CDC20 expression was strongly associated with advanced tumor stage in carcinoma of the breast, colon, endometrium and prostate (52). Thus, CDC20 is probably a biomarker for the diagnosis and prognosis of EC and may serve as a therapeutic target. A number of similar studies have been previously conducted and showed various results. For example, the analysis of the GSE17025 dataset by Liu et al (53) reported that PCDH10, CCL20 and TOP2A was associated with EC and may be potential molecular markers. The main reason for the differences is that the data was obtained from three datasets and some genes were filtered out by Wayne chart in in the present study. In addition, although the analytical methods were used correctly, the possibility of error cannot be completely excluded, and using different datasets may lead to different results. Thus, the findings of the present study require validation in experiments by using samples of EC and normal tissues. Furthermore, there are many other DEGs except the three key genes selected, which may also be involved in the biological process in EC.Nevertheless, the present study provides a new direction for further studies on EC. The DEGs identified in EC tissues may be involved in carcinogenesis or progression. The 11 hub genes in the significant module, particularly CCNB1, UBE2C and CDC20, may be important in the pathogenesis and progression of EC, and may be regarded as biomarkers for the diagnosis or prognosis of EC. However, further investigation is required to determine the definite functions of these genes in EC.
Authors: Alice Bradfield; Lucy Button; Josephine Drury; Daniel C Green; Christopher J Hill; Dharani K Hapangama Journal: Methods Protoc Date: 2020-09-03