Literature DB >> 35116555

Bioinformatics analysis on enrichment analysis of potential hub genes in breast cancer.

Limin Wei1, Yukun Wang1, Dan Zhou1, Xinyang Li1, Ziming Wang1, Ge Yao1, Xinshuai Wang1.   

Abstract

BACKGROUND: Despite recent advances in screening, treatment, and survival, breast cancer remains the most invasive cancer in women. The development of novel diagnostic and therapeutic markers for breast cancer may provide more information about its pathogenesis and progression.
METHODS: We obtained GSE86374 micro-expression matrix chip data from the Gene Expression Omnibus (GEO) database consisting of 159 samples (124 normal samples and 35 breast cancer samples). The language was then used to perform data processing and differential expression analysis. For all differentially expressed genes (DEGs), "FDR <0.01 and |logFC| ≥1" were selected as thresholds.
RESULTS: In this study, 173 up-regulated genes and 143 down-regulated genes were selected for GO and KEGG enrichment analysis. These genes are also significantly enriched in the KEGG pathway, including phenylalanine metabolism, staphylococcus aureus infection, and the PPAR signaling pathway. The survival and prognosis of the selected eight key genes (DLGAP5, PRC1, TOP2A, CENPF, RACGAP1, RRM2, PLK1, and ASPM) were analyzed by the Kaplan-Meier plotter database.
CONCLUSIONS: Eight hub genes and pathways closely related to the onset and progression of breast cancer were identified. We found that the PPAR signaling pathway, especially PPARγ, plays an important role in breast cancer and suggest this pathway be the subject of further research. 2021 Translational Cancer Research. All rights reserved.

Entities:  

Keywords:  Breast cancer; hub genes; key pathways; prognostic markers; survival analysis

Year:  2021        PMID: 35116555      PMCID: PMC8797715          DOI: 10.21037/tcr-21-749

Source DB:  PubMed          Journal:  Transl Cancer Res        ISSN: 2218-676X            Impact factor:   1.241


Introduction

Despite great advances in the diagnosis and treatment of cancer, breast cancer remains a major threat to women’s health. Breast cancer is the leading cause of cancer-related deaths among women worldwide, accounting for 14%, and its incidence and mortality rates are expected to gradually increase in the coming years (1). It comprises 22.9% of invasive cancers in women and 16% of all female cancers (2), and at the molecular level, is a heterogenous disease. According to its molecular characteristics, breast cancer can be divided into three types: BRCA mutation, hormone receptor (HR: estrogen receptor and progesterone receptor) activation, and human epidermal growth factor receptor 2 (HER2, encoded by ERBB2) activation (3). Through R language and related software packages, we explored potential molecular targets and signaling pathways related to the occurrence and development of breast cancer patients at the genomic level. This may provide an important theoretical basis for the discovery of new therapeutic targets for breast cancer. With the wide applications of gene expression analysis based on microarray technology, high-throughput and powerful research methods can simultaneously detect changes of expression in thousands of genes at mRNA levels. Using microarray technology for gene expression profiling, several studies have identified differentially expressed genes (DEGs) that play a critical role in the occurrence and progression of breast cancer, which also have the potential of becoming drug targets and diagnostic markers. In this study, we identified the associated DEGs between breast cancer tissue and normal breast tissue, and conducted continuous GO project enrichment analysis, KEGG pathway analysis, and PPI network analysis to search for hub genes and key pathways related to breast cancer. We present the following article in accordance with the MDAR checklist (available at http://dx.doi.org/10.21037/tcr-21-749).

Methods

Acquisition of microarray data

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The National Center for Biotechnology Information (NCBI) gene expression in the comprehensive database (GEO, http://www.ncbi.nlm.nih.gov/geo/) stores a selection of gene expression data sets, original series, and platforms. We downloaded GSE86374, a breast cancer-related dataset submitted by Rebollar-Vega R and based on the GPL6244 platform ((HuGene-1_0-st) Affymetrix Human Gene 1.0 ST Array), from the GEO database, comprising 124 breast cancer samples and 35 normal samples.

Identification of DEGs

Statistical software R (version 3.6.2, https://www.r-project.org/) and packages of Bioconductor (http://www.bioconductor.org/) were used to perform the significance analysis of DEGs between breast cancer samples and normal samples. Firstly, the quality of microarray data was detected by normalized unscaled standard errors (NUSE) box plot to remove unqualified samples. Based on the "Limma" package, we used empirical Bayesian method to screen for significantly different genes, then used the "hugene10sttranscriptcluster" package to annotate DEGs. A P<0.05 was considered statistically significant.

GO term and KEGG pathway enrichment analysis

Through GO term enrichment analysis, including biological processes, cellular components, and molecular functions, the biological significance of DEGs was explored, based on the Bioconductor Package "ggplot2". The KEGG pathway enrichment analysis of DEGGs was performed by the Bioconductor Package "pathview" to find key pathways closely related to the occurrence and progression of breast cancer. P<0.05 was considered statistically significant and achieved significant enrichment.

Protein-protein interaction (PPI) network analysis

On an interactive level, the PPI network can help us identify hub genes and key genetic modules related to breast cancer development. PPI information of DEGs was obtained from a search tool (STRING) database (http://www.string-db.org/) that retrieved interacting genes. The PPI network was then built using Cytoscape software, and a module analysis was performed in Cytoscape through a plug-in molecular complex detection (MCODE) to clarify the biological significance of the genetic modules in breast cancer. A P<0.05 was considered to be significantly significant.

Survival analysis of critical genes

In this study, key genes were obtained by applying to The Cancer Genome Atlas (TCGA) for patients with breast cancer data in the Kaplan Meier-plotter (http://kmplot.com/analysis/) database for survival prognosis analysis.

Statistical analysis

The FC in log2FC is fold change, which represents the ratio of the expression levels between normal samples and breast cancer samples, and the logarithm to the base 2 is log2FC. Generally, by default, the absolute value of log2FC is greater than 1 as the screening criterion for differential genes; FDR stands for False Discovery Rate, which is obtained by correcting the P-value of the significance of the difference. Since the differential expression analysis of transcriptome sequencing is an independent statistical hypothesis test on a large number of gene expression values, there will be false positive problems. Therefore, in the process of differential expression analysis, the recognized Benjamini-Hochberg correction method is used to correct the original hypothesis. The significant p-value obtained by the test is corrected, and FDR is finally used as the key indicator for the screening of differentially expressed genes. Generally take FDR <0.01 or 0.05 as the default standard.

Results

The identification of DEGs

The gene expression dataset GSE86374 was downloaded from the GEO database resulting in 46 microarrays based on the GPL6244 platform, including 124 breast cancer samples and 35 normal samples. Microarray data was preprocessed and the gene differential expression analyzed by using statistical analysis software R. We then selected 316 DEGs based on cut-off criteria [false discovery rate (FDR) <0.05 and |logFC| ≥1] consisting of 143 significantly down-regulated DEGs and 173 significantly up-regulated DEGs, for the subsequent bioinformatics analysis (). The expression level of the top 100 DEGs with fold change >1 is displayed in .
Figure 1

The identification of DEGs. (A) Volcano map with differential expression. The red represents173 up-regulated genes and the blue represents 143 down-regulated genes. (B) Heat maps of DEGs GO term enrichment analysis of DEGs. DEG, differentially expressed gene.

The identification of DEGs. (A) Volcano map with differential expression. The red represents173 up-regulated genes and the blue represents 143 down-regulated genes. (B) Heat maps of DEGs GO term enrichment analysis of DEGs. DEG, differentially expressed gene.

GO term enrichment analysis and KEGG pathway analysis of DEGs

Enrichment analysis of the DEGs was obtained through the screening of the “clusterProfiler” package. Firstly, the DEGs were analyzed for molecular function, cellular component, and biological process enrichment. These genes were found to have extracellular matrix structural constituents in terms of molecular function () including extracellular matrix fibrillar collagen trimer, spindle, chromosomal region, and condensed chromosome. for cellular components (, ), and chromosome segregation, nuclear division, organelle fission, and extracellular structure organization for biological processes (, ). Secondly, the KEGG pathway enrichment analysis of differential genes was found to be significantly enriched in phenylalanine metabolism, staphylococcus aureus infection, and PPAR signaling pathway (, ).
Figure 2

GO enrichment analysis result of DEGs with |logFC| ≥1: (A) molecular function; (B) cellular component; (C) biological process. (D) Visualization of KEGG pathway enrichment of DEGs in normal and breast cancer tissues (showing hsa03320 pathway). DEG, differentially expressed gene.

Table 1

Molecular function enrichment analysis of DEGs (FDR <0.05 and |logFC| ≥1) bewtween normal and breast cancer tissues

IDDescriptionP.adjustqvalueCount
GO:0007059Chromosome segregation2.40E–092.20E–0926
GO:0000070Mitotic sister chromatid segregation1.02E–089.30E–0918
GO:0000819Sister chromatid segregation1.30E–071.19E–0718
GO:0140014Mitotic nuclear division3.52E–073.22E–0721
GO:0098813Nuclear chromosome segregation4.73E–074.33E–0720
GO:0000280Nuclear division3.47E–063.17E–0624
GO:0051983Regulation of chromosome segregation1.82E–051.66E–0512
GO:0048285Organelle fission1.82E–051.66E–0524
GO:0031581Hemidesmosome assembly0.0001690.0001555
GO:0043062Extracellular structure organization0.0001690.00015522
GO:0060337Type I interferon signaling pathway0.0007040.00064410
GO:0071357Cellular response to type I interferon0.0007040.00064410
GO:0030198Extracellular matrix organization0.0007870.0007219
GO:0034340Response to type I interferon0.0008950.0008210
GO:0008608Attachment of spindle microtubules to kinetochore0.0014760.0013526
GO:0007088Regulation of mitotic nuclear division0.0015360.00140612
GO:0000226Microtubule cytoskeleton organization0.0016660.00152622
GO:0050000Chromosome localization0.0042050.003858
GO:0051303Establishment of chromosome localization0.0042050.003858
GO:0030574Collagen catabolic process0.0046540.0042616
GO:0051783Regulation of nuclear division0.0046540.00426112
GO:0051310Metaphase plate congression0.0048510.0044427
GO:0006936Muscle contraction0.0085530.00783217

DEG, differentially expressed gene; FDR, false discovery rate.

Table 2

Cellular component enrichment analysis of DEGs (FDR <0.05 and |logFC| ≥1) between normal and breast cancer tissue

IDDescriptionP.adjustqvalueCount
GO:0000793Condensed chromosome4.38E–053.94E–0516
GO:0000940Condensed chromosome outer kinetochore0.0001099.84E–055
GO:0000776Kinetochore0.0001099.84E–0512
GO:0005819Spindle0.0001099.84E–0519
GO:0000775Chromosome, centromeric region0.0001130.00010114
GO:0030496Midbody0.0002110.0001913
GO:0000777Condensed chromosome kinetochore0.0002170.00019610
GO:0031012Extracellular matrix0.0002730.00024622
GO:0098687Chromosomal region0.0003080.00027718
GO:0044420Extracellular matrix component0.0003080.0002777
GO:0000779Condensed chromosome, centromeric region0.0003260.00029410
GO:0005876Spindle microtubule0.0005790.0005227
GO:0000922Spindle pole0.0008720.00078511
GO:0005874Microtubule0.0015370.00138417

DEG, differentially expressed gene; FDR, false discovery rate.

Table 3

KEGG pathway enrichment analysis of DEGs between normal and breast cancer tissues

IDDescriptionP.adjustqvalueCount
hsa00360Phenylalanine metabolism0.0376850.0371784
hsa05150Staphylococcus aureus infection0.0376850.0371788
hsa03320PPAR signaling pathway0.0376850.0371787

DEG, differentially expressed gene.

GO enrichment analysis result of DEGs with |logFC| ≥1: (A) molecular function; (B) cellular component; (C) biological process. (D) Visualization of KEGG pathway enrichment of DEGs in normal and breast cancer tissues (showing hsa03320 pathway). DEG, differentially expressed gene. DEG, differentially expressed gene; FDR, false discovery rate. DEG, differentially expressed gene; FDR, false discovery rate. DEG, differentially expressed gene.

PPI network analysis of DEGs

The 416 DEGs were uploaded to the STRING database (version11) to construct a differential gene PPI network (), and to further identify the key genes, the network map data obtained from the STRING database was uploaded to Cytoscape software (version3.7.1). Setting the combined score >0.4 and MCODE Score as the screening threshold, we obtained the network diagram of one functional module (). According to the connectivity and Degree Score >47, we obtained eight key genes of DLGAP5, PRC1, TOP2A, CENPF, RACGAP1, RRM2, PLK1, and ASPM (). All hub genes related to the prognosis of patients.
Figure 3

Protein-protein interaction network analysis of DEGs. (A) 316 DEGs protein-protein interaction (PPI) network diagram. (B) Cluster consists of 49 nodes and 1,109 edges. (C) Screen the hub genes from DEGs and PPI. DEG, differentially expressed gene.

Protein-protein interaction network analysis of DEGs. (A) 316 DEGs protein-protein interaction (PPI) network diagram. (B) Cluster consists of 49 nodes and 1,109 edges. (C) Screen the hub genes from DEGs and PPI. DEG, differentially expressed gene. The prognostic value of the eight critical genes was analyzed in a Kaplan-Meier plotter and TCGA database was used as the reference data to obtain the overall survival time of breast cancer patients ().
Figure 4

Kaplan-Meier total survival analysis was performed for all DEGs in 1402 breast cancer patients from the TCGA database. (A) DLGAP5, P=7.3e−08; (B) PRC1, P=5.9e−10; (C) TOP2A, P=3.1e−08; (D) CENPF, P=1.3e−05; (E) RACGAP1, P=1.1e−07; (F) RRM2, P=2.1e−09; (G) PLK1, P=0.0012; (H) ASPM, P=8.3e−07. DEG, differentially expressed gene.

Kaplan-Meier total survival analysis was performed for all DEGs in 1402 breast cancer patients from the TCGA database. (A) DLGAP5, P=7.3e−08; (B) PRC1, P=5.9e−10; (C) TOP2A, P=3.1e−08; (D) CENPF, P=1.3e−05; (E) RACGAP1, P=1.1e−07; (F) RRM2, P=2.1e−09; (G) PLK1, P=0.0012; (H) ASPM, P=8.3e−07. DEG, differentially expressed gene.

Discussion

To screen for key genes and pathways closely associated with breast cancer, we identified significant DEGs between cancer samples and normal samples, and performed a series of bioinformatics analyses. Through significant analysis of microarray data in statistical software R, we identified 316 DEGs with a |logFC| ≥1, including 173 up-regulated DEGs and 143 down-regulated DEGs. Bioinformatics analysis of DEGs, including GO item enrichment analysis, KEGG pathway analysis, and PPI network analysis, identified genes and pathways associated with breast cancer which played an important role in the occurrence and progression of cancer in different ways. According to DEGs, GO item enrichment analysis, and PPI network analysis, eight hub genes were excavated. After further analysis, we found that all hub genes were connected with cellular component including extracellular matrix fibrillar collagen trimer, spindle, chromosomal region, and condensed chromosome, and were related to biological processes like chromosome segregation, nuclear division, organelle fission, and extracellular structure organization. By KEGG pathway analysis, these DEGs were enriched for phenylalanine metabolism, staphylococcus aureus infection, and PPAR signaling pathway. As a nuclear hormone receptor activated by fatty acids and their derivatives, PPAR (Peroxisome proliferator-activated receptors) consists of three subtypes; PPARα, PPARβ/δ, and PPARγ, which show different expression patterns in vertebrates. Each one is encoded by a single gene and combines fatty acids with eicosanes (4). PPARα cleaves circulating or cellular lipids and acts by regulating the expression of genes involved in lipid metabolism in the liver and skeletal muscle, and PPARβ/δ plays a role in lipid oxidation and cell proliferation. PPARγ plays an important role in adipocyte differentiation by mediating adipocyte differentiation and regulating adipocyte metabolism, and has been found to be an important tumor suppressor gene in many malignancies (5). This prompted us to investigate the expression and mutational status of the PPARγ gene in cancers of a variety of tissues. Previous studies showed that PPARγ is also expressed in malignant tissue including breast cancer and that PPARγ inhibitors could inhibit proliferation and induce differentiation of transformed cells (6,7). It is also reported that PPARδ selective receptor agonists stimulate human breast cancer spreading cell line and primary endothelial cells (8). In addition to the key signaling pathways, we analyzed the key genes that were excavated. Interestingly, all key genes, including DLGAP5, PRC1, TOP2A, CENPF, RACGAP1, RRM2, PLK1, and ASPM, were clearly related to the prognosis of patients diagnosed with breast cancer, with a P value <0.05. DLGAP5 up-regulation also has shown to be closely associated with cellular invasion (9), and is a novel cell cycle-regulated gene that can inhibit the proliferation and invasion of carcinoma cells (10). Although the gene that encodes PRC1 is not normally mutated in cancer, in many hormone-related cancers, including breast cancer, some of the typical PRC1 genes are amplified and malregulated. Hormone-associated cancers have a unique carcinogenic mechanism in which the accumulation of mutations induced by proliferation are hormone driven (11,12). As a single amplicon downstream of the HER2 amplicon, TOP2A is frequently altered in HER2-amplified tumors (13,14). A large-scale analysis showed that in ER-positive breast cancer, elevated TOP2A expression was an independent prognostic factor, strongly correlated with tumor size, grade, lymph node status, HER2 status, and Ki67 expression (13,15). Centromere protein F (CENPF) is a cell cycle-related nuclear antigen, which is maximally expressed in G2/M cells and at a low level in G0/G1 cells, and aggregates in the nuclear stroma during S phase. CENPF was identified as a marker of cell proliferation in several human malignancies, including breast cancer (16,17). RACGAP1 is a known regulator of cytokinesis, and studies have shown that knockout of RACGAP1 causes about 30–45% of basal-like breast cancer cells to fail cytoplasmic division and become multinucleated, which may be an important reason why these cells are unable to proliferate (18,19). RRM2 has been shown to be up-regulated in breast cancer, and miR-204-5p inhibited RRM2 expression by targeting RRM2 (20). Polo-like kinase 1 (Plk1) was a frequent and strong hit in basal breast cancer cell lines, indicating its importance for the growth and survival of these breast cancer cells (21). Polo-like kinase 1 (Plk1) is a key oncogenic regulator of completion of G2-M phase of the cell cycle (22), and its analysis showed that ASPM inhibition by siRNA-mediated knockdown inhibits tumor cell proliferation (23). In summary, after a series of bioinformatics analyses of DEGs to detect differences in breast cancer samples and normal samples, we identified eight hub genes and pathways closely related to the occurrence and progression of breast cancer. We found that in the PPAR signaling pathway, PPARγ plays an important role in breast cancer. These identified genes and pathways may provide a more defined underlying molecular mechanism explaining the occurrence and progression of breast cancer, and hold promise for acting as potential biomarkers and therapeutic targets.
  23 in total

1.  Mutational analysis of the peroxisome proliferator-activated receptor gamma gene in human malignancies.

Authors:  T Ikezoe; C W Miller; S Kawano; A Heaney; E A Williamson; J Hisatake; E Green; W Hofmann; H Taguchi; H P Koeffler
Journal:  Cancer Res       Date:  2001-07-01       Impact factor: 12.701

2.  MgcRacGAP controls the assembly of the contractile ring and the initiation of cytokinesis.

Authors:  Wei-meng Zhao; Guowei Fang
Journal:  Proc Natl Acad Sci U S A       Date:  2005-08-29       Impact factor: 11.205

3.  Nuclear autoantigen p330d/CENP-F: a marker for cell proliferation in human malignancies.

Authors:  G Landberg; M Erlanson; G Roos; E M Tan; C A Casiano
Journal:  Cytometry       Date:  1996-09-01

Review 4.  [The structures and functions of peroxisome proliferator-activated receptors (PPARs)].

Authors:  Nobuyuki Takahashi; Tsuyoshi Goto; Tatsuya Kusudo; Tatsuya Moriyama; Teruo Kawada
Journal:  Nihon Rinsho       Date:  2005-04

5.  Molecular portraits of human breast tumours.

Authors:  C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein
Journal:  Nature       Date:  2000-08-17       Impact factor: 49.962

6.  Gene expression of topoisomerase II alpha (TOP2A) by microarray analysis is highly prognostic in estrogen receptor (ER) positive breast cancer.

Authors:  A Rody; T Karn; E Ruckhäberle; V Müller; M Gehrmann; C Solbach; A Ahr; R Gätje; U Holtrich; M Kaufmann
Journal:  Breast Cancer Res Treat       Date:  2008-03-14       Impact factor: 4.872

7.  Amplification of HER2 and TOP2A and deletion of TOP2A genes in a series of Taiwanese breast cancer.

Authors:  Jim-Ray Chen; Hui-Ping Chien; Kuo-Su Chen; Cheng-Cheng Hwang; Huang-Yang Chen; Kun-Yan Yeh; Tsan-Yu Hsieh; Liang-Che Chang; Yuan-Chun Hsu; Ren-Jie Lu; Chung-Ching Hua
Journal:  Medicine (Baltimore)       Date:  2017-01       Impact factor: 1.889

8.  Polycomb complexes associate with enhancers and promote oncogenic transcriptional programs in cancer through multiple mechanisms.

Authors:  Ho Lam Chan; Felipe Beckedorff; Yusheng Zhang; Jenaro Garcia-Huidobro; Hua Jiang; Antonio Colaprico; Daniel Bilbao; Maria E Figueroa; John LaCava; Ramin Shiekhattar; Lluis Morey
Journal:  Nat Commun       Date:  2018-08-23       Impact factor: 14.919

9.  Overexpression of CENPF correlates with poor prognosis and tumor bone metastasis in breast cancer.

Authors:  Jingbo Sun; Jingzhan Huang; Jin Lan; Kun Zhou; Yuan Gao; Zhigao Song; Yunyao Deng; Lixin Liu; Ying Dong; Xiaolong Liu
Journal:  Cancer Cell Int       Date:  2019-10-11       Impact factor: 5.722

10.  Polo-like kinase 1 (Plk1) inhibition synergizes with taxanes in triple negative breast cancer.

Authors:  Antonio Giordano; Yueying Liu; Kent Armeson; Yeonhee Park; Maya Ridinger; Mark Erlander; James Reuben; Carolyn Britten; Christiana Kappler; Elizabeth Yeh; Stephen Ethier
Journal:  PLoS One       Date:  2019-11-21       Impact factor: 3.240

View more
  2 in total

1.  Identification of biomarkers related to tumorigenesis and prognosis in breast cancer.

Authors:  Xuelaiti Paizula; Daniyaerjiang Mutailipu; Wenting Xu; Hu Wang; Lina Yi
Journal:  Gland Surg       Date:  2022-09

2.  DIRAS3, GPR171 and RAC2 were identified as the key molecular patterns associated with brain metastasis of breast cancer.

Authors:  Ji Dai; Qi Chen; Guoqing Li; Mengze Chen; Haohang Sun; Meidi Yan
Journal:  Front Oncol       Date:  2022-09-21       Impact factor: 5.738

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.