BACKGROUND: The incidence of triple negative breast cancer (TNBC) is at a relatively high level, and our study aimed to identify differentially expressed genes (DEGs) in TNBC and explore the key pathways and genes of TNBC. METHODS: The gene expression profiling (GSE86945, GSE86946 and GSE102088) data were obtained from Gene Expression Omnibus Datasets, DEGs were identified by using R software, Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of DEGs were performed by the Database for Annotation, Visualization and Integrated Discovery (DAVID) tools, and the protein-protein interaction (PPI) network of the DEGs was constructed by the STRING database and visualized by Cytoscape software. Finally, the survival value of hub DEGs in breast cancer patients were performed by the Kaplan-Meier plotter online tool. RESULTS: A total of 2998 DEGs were identified between TNBC and health breast tissue, including 411 up-regulated DEGs and 2587 down-regulated DEGs. GO analysis results showed that down-regulated DEGs were enriched in gene expression (BP), extracellular exosome (CC), and nucleic acid binding, and up-regulated were enriched in chromatin assembly (BP), nucleosome (CC), and DNA binding (MF). KEGG pathway results showed that DEGs were mainly enriched in Pathways in cancer and Systemic lupus erythematosus and so on. Top 10 hub genes were picked out from PPI network by connective degree, and 7 of top 10 hub genes were significantly related with adverse overall survival in breast cancer patients (P < .05). Further analysis found that only EGFR had a significant association with the prognosis of triple-negative breast cancer (P < .05). CONCLUSIONS: Our study showed that DEGs were enriched in pathways in cancer, top 10 DEGs belong to up-regulated DEGs, and 7 gene connected with poor prognosis in breast cancer, including HSP90AA1, SRC, HSPA8, ESR1, ACTB, PPP2CA, and RPL4. These can provide some guidance for our research on the diagnosis and prognosis of TNBC, and further research is needed to evaluate their value in the targeted therapy of TNBC.
BACKGROUND: The incidence of triple negative breast cancer (TNBC) is at a relatively high level, and our study aimed to identify differentially expressed genes (DEGs) in TNBC and explore the key pathways and genes of TNBC. METHODS: The gene expression profiling (GSE86945, GSE86946 and GSE102088) data were obtained from Gene Expression Omnibus Datasets, DEGs were identified by using R software, Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of DEGs were performed by the Database for Annotation, Visualization and Integrated Discovery (DAVID) tools, and the protein-protein interaction (PPI) network of the DEGs was constructed by the STRING database and visualized by Cytoscape software. Finally, the survival value of hub DEGs in breast cancer patients were performed by the Kaplan-Meier plotter online tool. RESULTS: A total of 2998 DEGs were identified between TNBC and health breast tissue, including 411 up-regulated DEGs and 2587 down-regulated DEGs. GO analysis results showed that down-regulated DEGs were enriched in gene expression (BP), extracellular exosome (CC), and nucleic acid binding, and up-regulated were enriched in chromatin assembly (BP), nucleosome (CC), and DNA binding (MF). KEGG pathway results showed that DEGs were mainly enriched in Pathways in cancer and Systemic lupus erythematosus and so on. Top 10 hub genes were picked out from PPI network by connective degree, and 7 of top 10 hub genes were significantly related with adverse overall survival in breast cancer patients (P < .05). Further analysis found that only EGFR had a significant association with the prognosis of triple-negative breast cancer (P < .05). CONCLUSIONS: Our study showed that DEGs were enriched in pathways in cancer, top 10 DEGs belong to up-regulated DEGs, and 7 gene connected with poor prognosis in breast cancer, including HSP90AA1, SRC, HSPA8, ESR1, ACTB, PPP2CA, and RPL4. These can provide some guidance for our research on the diagnosis and prognosis of TNBC, and further research is needed to evaluate their value in the targeted therapy of TNBC.
The incidence of breast cancer is at a relatively high level and is the most common diagnosis in cancer in women around the world.[ Triple-negative breast cancer refers to a special type of breast cancer in which estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER-2) are all negatively expressed and counts about 15% to 20% in all breast cancer.[ Triple negative breast cancer (TNBC), with a unique biological and clinical features, is more aggressive than other subtypes of breast cancer, and has shorter disease-free survival, high soft tissue and visceral metastasis,[ and higher mortality within 5 years than non-TNBC patients.[ At present, breast cancer treatment is mainly targeted at ER, PR, and HER2, however TNBC patients cannot benefit from endocrine therapy and targeted therapy because of lacking of targets, and results in poor prognosis, high recurrence and metastasis rate and mortality. Therefore, it is necessary to find the key genes and important pathways of TNBC, thereby identifying its pathogenesis and providing a certain direction for new treatment options.In our study, we aimed to find out TNBC-specific differentially expressed genes (DEGs). With the Gene Expression Omnibus (GEO) Datasets, we identified DEGs by comparing TNBC samples tissue with health control samples tissue. GO functional annotation, as well as Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, was performed for the screened DEGs, protein-protein interaction (PPI) network of DEGs were constructed by STRING Datasets and visualized by Cytoscape software to explore and identify hub genes associated with TNBC. Finally, the survival analysis of these hub genes derived from PPI was completed by Kaplan–Meier plotter online survival analysis tool. Our research will provide DEGs with comprehensive bioinformatics and contribute to a better understanding of the mechanisms, progression, and metastasis of triple-negative breast cancer.
Materials and methods
Microarray data
The data analyzed in our study were downloaded from GEO Datasets (https://www.ncbi.nlm.nih.gov/geo/). The gene expression profiling (GSE86945, GSE86946, and GSE102088) were from GPL17586 Affymetrix Human Transcriptome Array 2.0. 158 TNBC samples were from GSE86945 and GSE86946, and 114 health samples were from GSE102088. The data processed by our research were freely available online. Ethics and patient consent are not applicable.
Data processing
Analysis of DEGs and was conducted by using limma package[ (version = “3,8”) in R software (version x64 3,5,3). The oligo Bioconductor package[ was used to complete the reading of gene expression profile CEL format data files. Background correction, normalization and standardization were performed by RMA - Robust Multichip Average algorithm.[ DEGs were identified with a level of fold change >2 (log FC > 1) and adj. P value <.01. And the heatmaps of the top 20 DEGs (10 up-DEGs and 10 down-DEGs) was produced by heatmap package (version 1,0,12) and R software.
GO and KEGG pathway analysis
Gene Ontology (GO) is an ontology widely used in the field of bioinformatics for annotating large scale genes and gene products.[ It covers 3 aspects of biology: biological process (BP), molecular function (MF), and cellular component (CC). KEGG is a practical database resource for genome sequencing and polymer experiment technology. It is generated by molecular level information, especially macromolecular datasets, which can be used to predict which pathways a particular gene is enriched.[ It covers information resources such as diseases and pathways, GO analysis and KEGG analysis were performed by DAVID tools (https://david.ncifcrf.gov/). P < ,01 was considered statistically significant.[
PPI network of DEGs
We uploaded the top 1000 down-regulated DEGs and all the up-regulated DEGs to the online tool, STRING database (https://string-db.org/). And the Cytoscape software (version 3.7.1) was used to analysis the PPI networks basing on the STRING results. The PPI sub-networks were analysis by using the plug-in Molecular Complex Detection (MCODE), and we defined sub-networks with a level of MCODE scores >5 and number of nodes >20. The higher the degree of connectivity of nodes, the greater the role of network stability. And the degree of connectivity of each node was calculated by using the plug-in CytoHubba. The top ten genes with the highest connectivity were identified as key genes.
Survival analysis of hub genes
The Kaplan–Meier plotter (https://www.kmplot.com), containing microarray gene expression data and survival information covering GEO, The Cancer Genome Atlas (TCGA) and Cancer Biomedical informatics Grid (caBIG),[ was used to analyze the survival analysis of key genes in breast cancer. The top 10 hub genes were uploaded to Kaplan–Meier, respectively. The probe of each gene was chosen by “only JetSet best probe set” and P < .05 was considered to be statistically significant.
Results
Identification of DEGs
We analyzed 158 TNBC samples tissue and 114 health control samples tissue from GSE86945, GSE86946, and GSE102088. With R software, limma package and RMA - Robust Multichip Average algorithm, a total of 2998 DEGs were identified between TNBC samples tissue and health control samples tissue, including 411 up-regulated DEGs and 2587 down-regulate DEGs. And the heatmap of top 20 DEGs (top 10 up-regulated DEGs and top 10 down-regulated DEGs) was drawn by heatmap package and R software (Fig. 1).
Figure 1
Heatmap of top 10 up-regulated DEGs and top 10 down-regulated DEGs. Red, means up-regulation; Blue, means down-regulation. The value of expression intensity is derived from the gene expression level obtained by R software analysis. TNBC = Triple Negative Breast Cancer, HC = Health Control, d-DEGs = down-regulated DEGs, u-DEGs = up-regulated DEGs, DEGs = differentially expressed genes.
Heatmap of top 10 up-regulated DEGs and top 10 down-regulated DEGs. Red, means up-regulation; Blue, means down-regulation. The value of expression intensity is derived from the gene expression level obtained by R software analysis. TNBC = Triple Negative Breast Cancer, HC = Health Control, d-DEGs = down-regulated DEGs, u-DEGs = up-regulated DEGs, DEGs = differentially expressed genes.
GO analyses
The DAVID tools were used to perform GO analysis and KEGG pathways enrichment analysis (Table 1). We upload all DEGs to DAVID tool. The results of GO analysis were divided into biological processes (BP), cellular component (CC) and molecular function (MF). GO results indicated that the down-DEGs were significantly enriched in BP, CC, and MF. BP includes gene expression, cellular macromolecule biosynthetic process and gene expression regulation. CC includes extracellular exosome, extracellular vesicle, and extracellular organelle. And MF includes nucleic acid binding, heterocyclic compound binding, and organic cyclic compound binding. Up-DEGs also were significantly enriched in BP, CC, and MF. BP includes chromatin assembly, nucleosome organization, and chromatin assembly or disassembly. CC includes nucleosome, DNA packaging complex, and protein-DNA complex. And MF includes DNA binding, protein heterodimerization activity, and enzyme binding.
Table 1
Top 3 significantly enriched GO terms.
Top 3 significantly enriched GO terms.
KEGG pathway analysis
The up-DEGs and down-DEGs were uploaded separately. Down-DEGs were enriched in Pathways in cancer, Neurotrophin signaling pathway, MAPK signaling pathway, FoxO signaling pathway and cGMP-PKG signaling pathway. And up-DEGs were significantly enriched in Systemic lupus erythematosus, Alcoholism and Viral carcinogenesis (Table 2).
Table 2
KEGG pathway analysis of DEGs associated with TNBC.
KEGG pathway analysis of DEGs associated with TNBC.
PPI network of DEGs and identification of key gene
We uploaded the top 1000 down-DEGs and all of the up-DEGs to STRING to generate the PPI network (Fig. 2). We entered the results of the STRING web tool into the Cytoscape software and found that the PPI network only analyzed 822 nodes and 4429 edges (Figure not shown). And we used the plug-in MCODE based on Cytoscape software to analysis primary modules of the PPI sub-network. Based on the level of MCODE scores >5 and number of nodes >20, we got 6 sub-networks from the PPI network. And the top 3 sub-networks are shown in Figure 3. Sub-network 1 owned 26.296 score (Density∗#Nodes) and includes 28 nodes, 355 edges. Sub-network 2 owned 13.294 score and includes 18 nodes, 133 edges. And sub-network 3 owned 10.255 score and includes 56 nodes, 282 edges. After the calculation and analysis of the plug-in CytoHubba, we obtained the connectivity degree of each node, and we identified the top 10 genes assessed by the degree of connectivity in the PPI network (Table 3). The results showed that epidermal growth factor receptor (EGFR) was the most prominent gene with connectivity degree = 105, followed by heat shock protein 90 alpha family class A member 1 (HSP90AA1, degree = 86), SRC proto-oncogene, non-receptor tyrosine kinase (SRC, degree = 85), heat shock protein family A (Hsp70) member 8 (HSPA8, degree = 78), estrogen receptor 1 (ESR1, degree = 73), actin beta (ACTB, degree = 69), Jun proto-oncogene, AP-1 transcription factor subunit (JUN, degree = 68), signal transducer and activator of transcription 3 (STAT3, degree = 68), protein phosphatase 2 catalytic subunit alpha (PPP2CA, degree = 68) and ribosomal protein L4 (RPL4, degree = 55). All the top 10 genes belong to up-DEGs in TNBC.
Figure 2
Protein-protein interaction network of top 1000 up-DEGs and all the down-DEGs. DEGs = differentially expressed genes.
Figure 3
Top 3 primary modules of protein-protein interaction sub networks analyzed by the plug in molecular complex detection in Cytoscape software. (A) module 1; (B) module 2; (C) module 3.
Table 3
Top 10 hub genes with higher degree of connectivity.
Protein-protein interaction network of top 1000 up-DEGs and all the down-DEGs. DEGs = differentially expressed genes.Top 3 primary modules of protein-protein interaction sub networks analyzed by the plug in molecular complex detection in Cytoscape software. (A) module 1; (B) module 2; (C) module 3.Top 10 hub genes with higher degree of connectivity.To analyze the survival analysis of top 10 hub genes, we uploaded them to Kaplan–Meier plotter Breast cancer which contains 3951 patients. As the result showed that, 7 of top 10 hub genes were significant relative with adverse overall survival in breast cancer patients, including HSP90AA1, SRC, HSPA8, ESR1, ACTB, PPP2CA, and RPL4. Further studies found that EGFR gene is a good predictor of TNBC prognosis (Fig. 4).
Figure 4
Kaplan–Meier analysis. 1–10, Kaplan–Meier analysis of overall survival (months) of the top 10 key genes in breast cancer patients; 11, Kaplan–Meier analysis of EGFR in triple negative breast cancer.
Kaplan–Meier analysis. 1–10, Kaplan–Meier analysis of overall survival (months) of the top 10 key genes in breast cancer patients; 11, Kaplan–Meier analysis of EGFR in triple negative breast cancer.
Discussion
TNBC is a special type of pathology in breast cancer with high invasiveness, recurrence rate and mortality, and poor prognosis,[ and it occurs mostly in premenopausal young women, especially African women.[ Numerous researches have studied the different expressed genes associated TNBC.[ However, the pathogenesis of TNBC is still unclear, it is necessary for us to continue to analyze the different expressed genes in TNBC with the latest data.In our bioinformatics research, 3 datasets, GSE86945, GSE86946, and GSE102088, contained 158 TNBC patients and 114 heath people, downloaded from GEO datasets, were used to extract the DEGs. We found 2998 genes expressed different between TNBC and health control, including 411 up-regulated DEGs and 2587 down-regulated DEGs. After analyzed by DAVID database, we found down-regulated DEGs were enriched in gene expression, cellular macromolecule biosynthetic process, regulation of gene expression, extracellular exosome, extracellular vesicle, extracellular organelle, nucleic acid binding, heterocyclic compound binding as well as organic cyclic compound binding. And up-regulated DEGs were enriched in chromatin assembly, nucleosome organization, chromatin assembly or disassembly, nucleosome, DNA packaging complex, protein-DNA complex, DNA binding, protein heterodimerization activity, and enzyme binding. We selected top 10 hub genes sorted by PPI networks connective degree, and 7 of them were significantly related to the prognosis of breast cancer. These genes include HSP90AA1, SRC, HSPA8, ESR1, ACTB, PPP2CA, and RPL4. However, further studies found that only EGFR gene was a good predictor of TNBC prognosis. It was exactly the opposite of what we predicted. May be due to the relatively small number of cases that can be analyzed, and we need to collect more patients for analysis. Though the hub we identified were different from previous studies,[ we all have provided some reference directions for the diagnosis, treatment and prognosis of TNBC. And the difference may be due to different number of cases analyzed, different identification criteria and experimental errors, or other reasons.HSP90AA1, heat shock protein 90 alpha family class A member 1, is highly express in brain, testis, placenta, gall gladder, and adrenal,[ and plays a significant role in the activity and stability of many proteins responsible for tumor initiation, progression and metastasis.[ HSP90AA1 has ability to bind and hydrolyze ATP, which is essential for its chaperone function.[ HSP90AA1 is also identified by Zeng[ as a DEG of breast cancer. However, Jarzab et al[ thinks that it is low express in breast cancer, it was just the opposite of the results of our analysis. In addition, our results were somewhat close to the study of Cheng.[ Perhaps there is a difference in gene expression between breast cancer and triple-negative breast cancer.SRC, SRC proto-oncogene, non-receptor tyrosine kinase, is also high express in stomach, testis, gall bladder, and duodenum.[ SRC can regulate the expression of some enzymes that mediate tumor biological activity[ and it is activated in many human cancers.[ Previous several mouse models experiments showed that SRC played a vital role during mammary gland development and breast cancer progression,[ and SRC is highly expressed in most of breast cancer patient samples.[ Additional, Abdullah found that SRC can promote MYC mRNA, one transcription factor was important in breast cancer, expression in estrogen receptor-positive breast cancer.[ In vitro experiments show that increased SRC kinase activity can promote breast cancer invasiveness.[ Importantly, some SRC inhibitors have been developed for targeted therapy of TNBC,[ so we consider that SRC is a target gene in TNBC, and we need to further study targeted therapy for the SRC, as well as prevent the occurrence and early diagnosis of TNBC.HSPA8, heat shock protein family A (Hsp70) member 8, belongs to HSP70 (heat-shock protein 70) families containing 13 members.[ According to previous reports, Hsp70 can promote cancer cell growth through different mechanisms.[ And experiments showed that HSC70 was a small target for somatic mutations and deletions in breast cancer.[ HSPA8, expressed highly in many tumors,[ is stress proteins and closely related to the occurrence and prognosis of various tumors, and plays a significant regulatory role in tumor cell proliferation and apoptosis.[ Interestingly, some researchers have shown that HSPA8 can be used as an early biomarker for liver cancer.[ In our study, HSPA8 was high expressing in TNBC compared health control samples. Our survival analysis also showed that it has a poor prognosis for TNBC. So in our study, HSPA8 was also considered as a biomarker, a prognostic factor and even a potential targeted therapeutic target for TNBC, just as previous report.[Except HSP90AA1, SRC and HSPA8, ESR1, ACTB, PPP2CA, and RPL4 were also up-regulated expressed in TNBC samples tissue compared with health breast samples tissue, and they all were related to poor prognosis in breast cancer patients. However, we found down-regulated DEGs enriched in Pathways in cancer (hsa05200) with a largest amount of genes (P value = 1.33 E–04). There is still no report about the low expression genes of Pathways in cancer will promote the development of cancer. We speculated that there may be another new pathogenesis of TNBC. Thus, further study of the pathogenesis of TNBC and the relationship between Pathways in cancer and TNBC is needed.
Conclusion
Our study identified 411 up-regulated genes and 2587 down-regulated genes between TNBC and health breast from 3 GEO series. With connective degree of PPI network, we selected top 10 hub genes, and after the analysis of Kaplan–Meier plotter, we identified 7 of them were hub genes in TNBC. These genes included HSP90AA1, SRC, HSPA8, ESR1, ACTB, PPP2CA, and RPL4. These new hub genes associated TNBC will provide us with some new research directions on TNBC.
Authors: Marissa V Powers; Keith Jones; Caterina Barillari; Isaac Westwood; Rob L M van Montfort; Paul Workman Journal: Cell Cycle Date: 2010-04-15 Impact factor: 4.534
Authors: Don A Delker; David R Geter; Barbara C Roop; William O Ward; Gene J Ahlborn; James W Allen; Gail M Nelson; Ming Ouyang; William Welsh; Yan Chen; Thomas O'Brien; Kirk T Kitchin Journal: J Biochem Mol Toxicol Date: 2009 Nov-Dec Impact factor: 3.642
Authors: Mohammed S Fayaz; Mustafa S El-Sherify; Amany El-Basmy; Sadeq A Zlouf; Nashwa Nazmy; Thomas George; Susan Samir; Gerges Attia; Heba Eissa Journal: Rep Pract Oncol Radiother Date: 2013-09-26
Authors: Ernest Turro; Daniel Greene; Anouck Wijgaerts; Chantal Thys; Claire Lentaigne; Tadbir K Bariana; Sarah K Westbury; Anne M Kelly; Dominik Selleslag; Jonathan C Stephens; Sofia Papadia; Ilenia Simeoni; Christopher J Penkett; Sofie Ashford; Antony Attwood; Steve Austin; Tamam Bakchoul; Peter Collins; Sri V V Deevi; Rémi Favier; Myrto Kostadima; Michele P Lambert; Mary Mathias; Carolyn M Millar; Kathelijne Peerlinck; David J Perry; Sol Schulman; Deborah Whitehorn; Christine Wittevrongel; Marc De Maeyer; Augusto Rendon; Keith Gomez; Wendy N Erber; Andrew D Mumford; Paquita Nurden; Kathleen Stirrups; John R Bradley; F Lucy Raymond; Michael A Laffan; Chris Van Geet; Sylvia Richardson; Kathleen Freson; Willem H Ouwehand Journal: Sci Transl Med Date: 2016-03-02 Impact factor: 17.956
Authors: Linn Fagerberg; Björn M Hallström; Per Oksvold; Caroline Kampf; Dijana Djureinovic; Jacob Odeberg; Masato Habuka; Simin Tahmasebpoor; Angelika Danielsson; Karolina Edlund; Anna Asplund; Evelina Sjöstedt; Emma Lundberg; Cristina Al-Khalili Szigyarto; Marie Skogs; Jenny Ottosson Takanen; Holger Berling; Hanna Tegel; Jan Mulder; Peter Nilsson; Jochen M Schwenk; Cecilia Lindskog; Frida Danielsson; Adil Mardinoglu; Asa Sivertsson; Kalle von Feilitzen; Mattias Forsberg; Martin Zwahlen; IngMarie Olsson; Sanjay Navani; Mikael Huss; Jens Nielsen; Fredrik Ponten; Mathias Uhlén Journal: Mol Cell Proteomics Date: 2013-12-05 Impact factor: 5.911