Jintao Cao1, Shuai Sun1, Rui Min2, Ran Li3, Xingyu Fan4, Yuexin Han4, Zhenzhong Feng3, Nan Li1,2. 1. Department of Pathology, Bengbu Medical College, Bengbu, Anhui Province, 233000, People's Republic of China. 2. Department of Pathology, The First Affiliated Hospital of Bengbu Medical College, Bengbu Medical College, Bengbu, Anhui Province, 233003, People's Republic of China. 3. Department of Pathology, The Second Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230601, People's Republic of China. 4. School of Clinical Medicine, Bengbu Medical College, Bengbu, Anhui Province, People's Republic of China.
Abstract
PURPOSE: The aim of this study was to explore potential gene therapy targets for triple-negative breast cancer (TNBC). PATIENTS AND METHODS: Three gene expression profiles (GSE64790, GSE62931, and GSE38959) from the Gene Expression Omnibus (GEO) database were analyzed. The GEO2R analysis tool was used to screen for differentially expressed genes (DEGs) between TNBC and normal tissues, followed by Gene Ontology functional annotation and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of the DEGs. The protein-protein interaction network of DEGs was visualized using Metascape to identify the core genes. Subsequently, transcriptional data for the core genes in patients with breast cancer were investigated in the ONCOMINE database. Kaplan-Meier survival analysis was used to evaluate the prognostic value of core gene expression levels in patients with TNBC. Finally, the clinicopathological and long-term follow-up data of 39 patients with TNBC were retrospectively analyzed at the First Affiliated Hospital of the Bengbu Medical College between January 2014 and July 2020. Immunohistochemistry was used to evaluate the expression and subcellular localization of CCNB2 in TNBC tissues. RESULTS: A total of 66 DEGs were identified between TNBC and normal tissues, including 33 upregulated and 33 downregulated genes in TNBC. Furthermore, a potential protein complex was identified for five core genes. The high expression of these core genes, especially the overexpression of CCNB2, was correlated with a poor prognosis of patients with TNBC. The CCNB2 protein was expressed in the cytoplasm, and its expression was significantly higher in TNBC tissues than that in the adjacent nontumor tissues. Overall survival of patients was significantly correlated with the expression of CCNB2 (p < 0.05). CONCLUSION: CCNB2 may play a crucial role in the development of TNBC and has the potential to be used as a prognostic biomarker for TNBC.
PURPOSE: The aim of this study was to explore potential gene therapy targets for triple-negative breast cancer (TNBC). PATIENTS AND METHODS: Three gene expression profiles (GSE64790, GSE62931, and GSE38959) from the Gene Expression Omnibus (GEO) database were analyzed. The GEO2R analysis tool was used to screen for differentially expressed genes (DEGs) between TNBC and normal tissues, followed by Gene Ontology functional annotation and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of the DEGs. The protein-protein interaction network of DEGs was visualized using Metascape to identify the core genes. Subsequently, transcriptional data for the core genes in patients with breast cancer were investigated in the ONCOMINE database. Kaplan-Meier survival analysis was used to evaluate the prognostic value of core gene expression levels in patients with TNBC. Finally, the clinicopathological and long-term follow-up data of 39 patients with TNBC were retrospectively analyzed at the First Affiliated Hospital of the Bengbu Medical College between January 2014 and July 2020. Immunohistochemistry was used to evaluate the expression and subcellular localization of CCNB2 in TNBC tissues. RESULTS: A total of 66 DEGs were identified between TNBC and normal tissues, including 33 upregulated and 33 downregulated genes in TNBC. Furthermore, a potential protein complex was identified for five core genes. The high expression of these core genes, especially the overexpression of CCNB2, was correlated with a poor prognosis of patients with TNBC. The CCNB2 protein was expressed in the cytoplasm, and its expression was significantly higher in TNBC tissues than that in the adjacent nontumor tissues. Overall survival of patients was significantly correlated with the expression of CCNB2 (p < 0.05). CONCLUSION: CCNB2 may play a crucial role in the development of TNBC and has the potential to be used as a prognostic biomarker for TNBC.
Breast cancer has now surpassed lung cancer as the most common cancer, with an estimated 2.3 million new cases. According to reports, the incidence of breast cancer has increased annually in the past few decades, and it has become the most prevalent among female malignancies. Despite advances in the treatment and diagnosis, approximately 685,000 patients die from breast cancer in 2020 worldwide.1–3 Triple-negative breast cancer (TNBC), a special clinical subtype of breast cancer, which is characterized by negative expression of the estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER-2), accounts for 12% to 17% of all invasive breast cancers.4 Since TNBC demonstrates highly malignant features, such as strong invasiveness, early metastasis, frequent recurrence, and a short survival, it has attracted widespread attention.5Owing to the abundance of molecular information from several public databases, such as Gene Expression Omnibus (GEO) and ONCOMINE,6 the mechanism of cancer progression can be researched using unparalleled methods. Additionally, the differentially expressed genes (DEGs) between cancer and normal tissues can be screened based on bioinformatics analysis. The identification of these oncogenes or tumor suppressor genes may lead to the prediction of potential biomarkers and may provide new therapeutic strategies for cancers. Among various bioinformatics methods, DEG analysis is a widely used independent tool to study gene upregulation and downregulation.7In this study, we analyzed and validated cancer-related genes using bioinformatics methods to explore new therapeutic targets that could improve the overall prognosis of patients with TNBC.
Materials and Methods
Source of Data
The GEO database () was used to download the original data, including the expression profiles of TNBC and non-TNBC tissues. A total of 4442 results for “TNBC” were retrieved from the GEO datasets, among which three TNBC-related gene expression profiles (GSE64790, GSE62931, and GSE38959) were selected. This study did not involve any human or animal experiments.
Screening for DEGs
In each profile, the data were divided into TNBC and non-TNBC subsets. The online analysis tool GEO2R () was used to analyze the data. An adjusted p-value of < 0.05 and a |log10 fold change (FC)| of ≥ 1.5 were defined as meaningful differences. Statistical analysis was then performed on the three datasets, and the Venn graph network tool () was used to determine the overlapping DEGs.
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Enrichment Analyses
The Database for Annotation, Visualization, and Integrated Discovery (DAVID) was used to simultaneously perform GO functional annotation and KEGG pathway enrichment analysis. In GO analysis, p < 0.01 and a count of ≥ 10 were defined as statistically significant. For KEGG pathway analysis, p < 0.01 was considered meaningful.
Protein–Protein Interaction (PPI) Network Construction and DisGeNET Analysis
Metascape ()8,9 was applied to analyze the enriched pathways and processes of DEGs and their adjacent genes, including GO terms for cellular component (CC), biological process (BP), and molecular function (MF) categories and KEGG pathways. A p-value of < 0.01, enrichment factor of > 1.5, and minimum count of 3 were considered meaningful. A subset of enriched terms was selected, and a network plot was drawn to further determine the relationships among the terms. The following databases were used for PPI enrichment analysis: BioGrid+, InWeb_IM+, and OmniPath+. Moreover, the molecular complex detection (MCODE) algorithm was used to identify tightly connected network components. Metascape-provided DisGeNET analysis was used to study and predict human disease-related genes.
ONCOMINE Database Analysis
Using the ONCOMINE ()10 database, we determined the mRNA expression levels of the SKA1, CCNB2, CENPF, CENPA, and BIRC5 genes in various cancers.
Kaplan–Meier Analysis
The Kaplan–Meier plotter () was used to evaluate the prognostic value of DEGs in TNBC. The patient samples were divided into two groups (high and low expression) based on the median expression level. Using Kaplan–Meier survival plots, the relapse-free survival (RFS) of patients with TNBC was determined, and the risk ratio was estimated, along with the 95% confidence interval (CI) and the log-rank p-value.
Immunohistochemistry
TNBC specimens confirmed by pathological diagnosis were obtained from the Pathology Department of the First Affiliated Hospital of the Bengbu Medical College. CCNB2 expression was detected by immunohistochemical staining. The tissue sections were deparaffinized and dehydrated following routine protocols, and endogenous peroxidase activity was inactivated with 3% H2O2 in methanol. The primary antibody against CCNB2 (ab185622, Abcam, Cambridge, UK) was diluted to 1/100 with PBS. The tissue sections were then incubated with the relevant antibodies, stained with diaminobenzidine, and counterstained with hematoxylin. Except for the primary antibody, all reagents used in the immunohistochemical experiment were purchased from Fuzhou Maixin Biological Co. Ltd., Fujian Province, China. The positive cells were identified by the obvious brown granules in the cell membrane or cytoplasm.
Statistical Analysis
The Kaplan–Meier method was used for univariate overall survival (OS) analysis. OS was defined as the period from diagnosis to recurrence, metastasis, death, or the end of follow-up. The SPSS software version 22.0 (IBM, New York, NY, USA) was used for all statistical analyses. Statistical significance was set at p < 0.05.
Results
Identification of DEGs
In the GEO database, we selected three TNBC-related gene expression profiles (GSE64790, GSE62931, and GSE38959). The results showed that there were 600 DEGs in GSE62931, of which 269 were upregulated and 331 were downregulated in TNBC. GSE38959 had 1550 DEGs, of which 1010 were upregulated and 540 were downregulated in TNBC. In GSE64790, a total of 660 DEGs were detected, including 186 upregulated and 374 downregulated in TNBC. Venn diagram analysis resulted in the identification of a total of 66 overlapping DEGs, of which 33 were upregulated and 33 were downregulated in TNBC (Table 1, Figure 1).
Table 1
Statistics of the Three Microarray Datasets Selected from the Gene Expression Omnibus Database
Dataset ID
Triple-Negative Breast Cancer
Normal
Total
GSE64790
3
3
6
GSE62931
47
53
100
GSE38959
30
13
43
Figure 1
Venn diagrams of DEGs common to three Gene Expression Omnibus datasets. (A) Total DEGs; (B) upregulated DEGs; (C) downregulated DEGs.
Statistics of the Three Microarray Datasets Selected from the Gene Expression Omnibus DatabaseVenn diagrams of DEGs common to three Gene Expression Omnibus datasets. (A) Total DEGs; (B) upregulated DEGs; (C) downregulated DEGs.
Functional Enrichment Analysis of DEGs in Patients with TNBC
The DEG list was entered into the DAVID for GO and KEGG pathway enrichment analyses. The enriched GO terms included the CC, BP, and MF categories. The results of GO analysis revealed that the DEGs were mainly enriched in BP terms related to mitosis and cell proliferation, CC terms related to the nucleus and nucleoplasm, and MF terms related to protein binding. In addition, KEGG pathway analysis revealed that the DEGs were mainly enriched in pathways related to progesterone-mediated oocyte maturation, oocyte meiosis, and the cell cycle (Table 2).
Table 2
Significantly Enriched GO Terms and KEGG Pathways
Category
Term
Description
Count
p-value
BP
GO:0007067
Mitotic nuclear division
10
1.4E−7
BP
GO:0008283
Cell proliferation
10
3.6E−6
CC
GO:0005634
Nucleus
32
4.3E−4
CC
GO:0005654
Nucleoplasm
20
1.4E−3
MF
GO:0005515
Protein binding
42
4.8E−3
KEGG pathway
hsa04914
Progesterone-mediated oocyte maturation
4
3.7E−3
KEGG pathway
hsa04114
Oocyte meiosis
4
7.3E−3
KEGG pathway
hsa04110
Cell cycle
4
9.8E−3
Abbreviations: BP, biological process; CC, cellular component; MF, molecular function; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Significantly Enriched GO Terms and KEGG PathwaysAbbreviations: BP, biological process; CC, cellular component; MF, molecular function; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.The results of Metascape analysis showed that the DEGs and their neighboring genes were mainly enriched in cell division, mitotic nuclear division, and cell cycle phase transition (Figure 2A and B). Meanwhile, subnetwork analysis of the PPI network resulted in the identification of the potential protein complex for five core genes (CCNB2, BIRC5, CENPA, CENPF, and SKA1) (Figure 2C and D). Quality control and association analysis using DisGeNET showed that these DEGs were significantly related to the occurrence of invasive breast carcinoma, carcinoma of the male breast, malignant neoplasm of the male breast, and other diseases (p < 0.01) (Figure 2E).
Figure 2
Enrichment analysis of DEGs and neighboring genes in triple-negative breast cancer. (A) Heatmap of enriched GO and KEGG terms, colored based on p-values. (B) Network of enriched GO and KEGG terms, colored based on p-values (terms containing more genes tend to have a more significant p-value). (C) PPI network. (D) Five most significant MCODE components from the PPI network. (E) DisGeNET data for the DEGs.
Enrichment analysis of DEGs and neighboring genes in triple-negative breast cancer. (A) Heatmap of enriched GO and KEGG terms, colored based on p-values. (B) Network of enriched GO and KEGG terms, colored based on p-values (terms containing more genes tend to have a more significant p-value). (C) PPI network. (D) Five most significant MCODE components from the PPI network. (E) DisGeNET data for the DEGs.
Transcription Levels of DEGs in Patients with Breast Cancer
The graph shows the number of datasets with statistically significantly upregulated (red) or downregulated (blue) mRNA expression of the target genes (Figure 3). The threshold was designed with the following parameters: a p-value of 0.001 and an FC of 1.5. The transcription levels of the core genes in cancers were compared with those in normal tissues using the ONCOMINE database (Figure 4). The data showed that the mRNA expression of BIRC5, CCNB2, CENPA, CENPF, and SKA1 was upregulated in patients with breast cancer. In the Curtis dataset, BIRC5 was upregulated in medullary breast carcinoma, with an FC of 6.014 and a p-value of 9.13E−17. In the Turashvili dataset, CCNB2 was overexpressed in invasive ductal breast carcinoma, with an FC of 4.653 and a p-value of 6.05E−6. In the Curtis dataset, CENPA was overexpressed in invasive ductal breast carcinoma, with an FC of 2.183 and a p-value of 1.27E−115. In The Cancer Genome Atlas dataset, the transcription level of CENPF was significantly higher in patients with invasive lobular breast carcinoma than that in normal specimens, with an FC of 6.980 and a p-value of 1.31E−21. In the Turashvili dataset, the FC in the mRNA expression of SKA1 in invasive ductal breast carcinoma was 7.501, and the p-value was 2.48E−6.
Figure 3
Transcription levels of the core genes in different types of cancers in Oncomine database (blue: low expression, red: high expression, comparison within the same line).
Figure 4
Expressions of the core genes in different breast cancer research microarrays. (A) CENPF expression in TCGA breast (1: breast, 2: invasive lobular breast carcinoma). (B) SKA1 expression in Turashvili breast (1: ductal breast cell, 2: lobular breast cell, 3: invasive ductal breast carcinoma). (C) BIRC5 expression in Curtis breast (1: breast, 2: medullary breast carcinoma). (D) CENPA expression in Curtis breast (1: breast, 2: invasive ductal breast carcinoma). (E) CCNB2 expression in Turashvili breast (1: ductal breast cell, 2: lobular breast cell, 3: invasive ductal breast carcinoma).
Transcription levels of the core genes in different types of cancers in Oncomine database (blue: low expression, red: high expression, comparison within the same line).Expressions of the core genes in different breast cancer research microarrays. (A) CENPF expression in TCGA breast (1: breast, 2: invasive lobular breast carcinoma). (B) SKA1 expression in Turashvili breast (1: ductal breast cell, 2: lobular breast cell, 3: invasive ductal breast carcinoma). (C) BIRC5 expression in Curtis breast (1: breast, 2: medullary breast carcinoma). (D) CENPA expression in Curtis breast (1: breast, 2: invasive ductal breast carcinoma). (E) CCNB2 expression in Turashvili breast (1: ductal breast cell, 2: lobular breast cell, 3: invasive ductal breast carcinoma).
Association Between DEG Expression and Survival of Patients
Kaplan–Meier analysis revealed that the five core genes (CCNB2, CENPF, SKA1, CENPA, and BIRC5) were related to the RFS of patients with TNBC. Patients with higher expression levels had a worse RFS than those with lower expression levels. In particular, the overexpression of CCNB2 was the most unfavorable prognostic factor for RFS of patients with TNBC (hazard ratio = 1.98; 95% CI: 1.28–3.06; p = 0.0018; n = 255), consistent with the lowest log-rank p-value (Figure 5).
Figure 5
Prognostic values of mRNA expression levels of the core genes in patients with triple-negative breast cancer (Kaplan–Meier analysis). (A) Association of BIRC5 with RFS in TNBC. (B) Association of CENPA with RFS in TNBC. (C) Association of SKA1 with RFS in TNBC. (D) Association of CENPF with RFS in TNBC. (E) Association of CCNB2 with RFS in TNBC.
Prognostic values of mRNA expression levels of the core genes in patients with triple-negative breast cancer (Kaplan–Meier analysis). (A) Association of BIRC5 with RFS in TNBC. (B) Association of CENPA with RFS in TNBC. (C) Association of SKA1 with RFS in TNBC. (D) Association of CENPF with RFS in TNBC. (E) Association of CCNB2 with RFS in TNBC.
CCNB2 Protein Expression in TNBC Tissues
The results of immunohistochemical staining showed that CCNB2 protein expression in TNBC tissues was significantly higher than that in adjacent nontumor tissues. The protein was localized to the cytoplasm, as indicated by brown-yellow granular staining in the TNBC tissues (Figure 6).
Figure 6
Immunohistochemical staining (EnVision Method). (A) CCNB2 is negatively expressed in the adjacent nontumor tissue at ×400 magnification. (B) CCNB2 is positively expressed in the triple-negative breast cancer tissue (magnification, ×400).
Immunohistochemical staining (EnVision Method). (A) CCNB2 is negatively expressed in the adjacent nontumor tissue at ×400 magnification. (B) CCNB2 is positively expressed in the triple-negative breast cancer tissue (magnification, ×400).
Follow-Up
All patients with TNBC were females and were followed up until July 2020. The median follow-up time was 44 months (range: 10–78 months). During follow-up, eight patients (20.5%) died. The median age of the patients was 47 years old; 26 were under 50 years old, and 13 were over 50 years old. The average diameter of the tumor was 2.7 cm; however, in two cases, the tumor was larger than 5 cm.In 39 patients with complete follow-up data, Kaplan–Meier survival analysis showed that the expression of CCNB2 and the primary location of the tumor were significantly related to OS of the patients (p < 0.05). However, the patient’s age, tumor location, tumor diameter, lymph node metastasis, and distant metastasis were not associated with OS (p > 0.05) (Table 3 and Figure 7).
Table 3
Clinicopathological Characteristics and Kaplan–Meier Univariate Overall Survival Analysis of Patients with Triple-Negative Breast Cancer
Kaplan–Meier overall survival curves according to the (A) patient age (p = 0.170), (B) tumor size (p = 0.143), and (C) CCNB2 expression (p = 0.022).
Clinicopathological Characteristics and Kaplan–Meier Univariate Overall Survival Analysis of Patients with Triple-Negative Breast CancerAbbreviations: CI, confidence interval; *χ2, Log rank test chi-square value; CCNB2, cyclin B2.Kaplan–Meier overall survival curves according to the (A) patient age (p = 0.170), (B) tumor size (p = 0.143), and (C) CCNB2 expression (p = 0.022).
Discussion
Breast cancer is a malignant tumor that occurs in breast epithelial tissue.11 Because of the loose connection, breast cancer cells easily fall off, and free cancer cells can easily spread in the blood or lymph throughout the body, forming life-threatening metastases. All these factors make breast cancer a serious threat to women’s health. TNBC is a unique subtype of breast cancer. It does not express hormone receptors (ER and PR) and HER-2, which makes clinical targeted therapy and endocrine therapy ineffective.12 Chemotherapy is currently the main adjuvant treatment for patients with TNBC. However, its efficacy is limited compared with that of comprehensive therapy, especially in patients who show resistance to chemotherapy drugs.13 Therefore, identifying reliable biomarkers and effective targets for TNBC is urgently needed to improve the prognosis of patients.The development of second-generation sequencing technology and high-throughput sequencing platforms has resulted in the generation of a large amount of data, which is being interpreted using bioinformatics methods by an increasing number of researchers. In our study, gene and protein expression analysis based on publicly available bioinformatics databases was performed to screen for potential key genes related to TNBC. Using gene expression profiling data from the GEO database, 66 DEGs were identified between TNBC tissues and normal human breast tissues, which play an important role in cell proliferation and cell cycle. By constructing a PPI network, a potential protein complex that was strongly linked to invasive breast cancer was identified. This protein complex was mainly composed of the protein expression products of the five core genes (BIRC5, CCNB2, CENPA, CENPF, and SKA1). The analysis of these 5 genes using the GEO and ONCOMINE databases showed that they were all significantly overexpressed in breast cancer than in normal tissues (p < 0.05). The Kaplan-Meier plotter showed that CCNB2 overexpression is an unfavorable prognostic factor for patients with TNBC. To strengthen the credibility of the results of the bioinformatics analysis, experimental verification was conducted. Immunohistochemical staining indicated the cytoplasmic localization of CCNB2 in breast tissue and demonstrated that CCNB2 expression in breast tissue was higher than that in the adjacent tissues. Clinicopathological correlation analysis further confirmed that high expression of CCNB2 in patients with TNBC was associated with poor survival expectations.Cyclin family proteins, including cyclins B1 (CCNB1) and B2 (CCNB2), regulate the activity of cyclin-dependent kinases (CDKs). Different cyclins are involved in specific phases of the cell cycle,14–16 and CCNB2 plays an important role in the regulation of the cell cycle. During the interphase and mitosis, CCNB2 is located in the Golgi apparatus and participates in its decomposition.17 According to previous reports, CCNB2 usually triggers the process of G2/M phase transition by activating CDK1, and downregulation of CCNB2 inhibits cell proliferation and promotes cell cycle arrest in the G2/M phase.18–20 A study has shown that metformin downregulates the expression of CCNB2 to increase the rates of apoptosis and cell cycle arrest.21 A high level of CCNB2 is positively correlated with the degree of undifferentiation, the tumor size, lymph node metastasis, distant metastasis, and the clinical stage. In the past few years, the overexpression of CCNB2 in tumor tissues has been shown to be an unfavorable prognostic biomarker in many human cancers, including gastric cancer,22 breast cancer,23 pituitary adenoma,24 nasopharyngeal carcinoma,25 and adrenocortical carcinoma.26 The results of bioinformatics analysis showed that mRNA expression of CCNB2 was significantly associated with TNBC patients’ prognosis. Owing to the small sample size of TNBC, this study has certain limitations. In the future, we will continue to collect sample information in the clinic, and hope the findings of this study can provide a direction for future TNBC research and clinical treatment.In this study, BIRC5, SKA1, CENPA, and CENPF were highly expressed in TNBC compared to their expression in normal breast tissues, and CENPA, CENPF, and SKA1 expression levels were significantly correlated with a poor RFS (log-rank p < 0.05). However, the role of these genes in TNBC remains unclear, and more studies are needed.
Conclusions
In summary, CCNB2 protein expression was significantly increased in TNBC tissues and was related to the malignant status and prognosis of patients. The clinical value of CCNB2 has yet to be confirmed by further studies. In the future, the regulation mechanism of CCNB2-related signal pathways in TNBC will be further studied. Nevertheless, CCNB2 has broad potential as a therapeutic target and prognostic factor for TNBC.