Liqi Li1, Hu Huang1, Mingjie Zhu1, Junqiang Wu1. 1. Department of Breast Surgery, Affiliated Hospital of Jiangnan University, Wuxi, Jiangsu, People's Republic of China.
Abstract
PURPOSE: Triple negative breast cancer (TNBC) is an intrinsic subtype of breast cancer with a poor prognosis, characterized by a lack of ER and PR expression and the absence of HER2 amplification. The aim of this study is to characterize hub genes (key genes in the molecular interaction network) expression in TNBC, which may serve as prognostic predictors for TNBC treatment. METHODS: Four transcriptome microarray datasets GSE27447, GSE39004, GSE43358 and GSE45827 were obtained from the Gene Expression Omnibus (GEO) database, and R package limma and RobustRankAggreg were employed to identify common differentially expressed genes (DEGs). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted by DAVID and KOBAS database. Thereafter, protein-protein interaction (PPI) network was constructed according to STRING online database. Functional modules and hub genes were screened by MCODE and cytohubba plug-ins, and the Cancer Genome Atlas (TCGA) survival analysis and qRT-PCR were utilized to validate the expression of these hub genes on TNBC. RESULTS: A total of 134 DEGs were identified by differential expression analysis, consisting of 88 up- and 46 down-regulated genes. GO and KEGG analyses showed that the terms and pathways enriched were mainly associated with cell adhesion, tumorigenesis and cellular immunity. From the PPI network, we identified six hub genes, including CD3D, CD3E, CD3G, FYN, GRAP2 and ITK. Survival analysis and the qRT-PCR results confirmed the robustness of the identified hub genes. CONCLUSION: This study provides a new insight into the understanding of the molecular mechanisms associated with TNBC and suggested that the hub genes may serve as prognostic predictors.
PURPOSE: Triple negative breast cancer (TNBC) is an intrinsic subtype of breast cancer with a poor prognosis, characterized by a lack of ER and PR expression and the absence of HER2 amplification. The aim of this study is to characterize hub genes (key genes in the molecular interaction network) expression in TNBC, which may serve as prognostic predictors for TNBC treatment. METHODS: Four transcriptome microarray datasets GSE27447, GSE39004, GSE43358 and GSE45827 were obtained from the Gene Expression Omnibus (GEO) database, and R package limma and RobustRankAggreg were employed to identify common differentially expressed genes (DEGs). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted by DAVID and KOBAS database. Thereafter, protein-protein interaction (PPI) network was constructed according to STRING online database. Functional modules and hub genes were screened by MCODE and cytohubba plug-ins, and the Cancer Genome Atlas (TCGA) survival analysis and qRT-PCR were utilized to validate the expression of these hub genes on TNBC. RESULTS: A total of 134 DEGs were identified by differential expression analysis, consisting of 88 up- and 46 down-regulated genes. GO and KEGG analyses showed that the terms and pathways enriched were mainly associated with cell adhesion, tumorigenesis and cellular immunity. From the PPI network, we identified six hub genes, including CD3D, CD3E, CD3G, FYN, GRAP2 and ITK. Survival analysis and the qRT-PCR results confirmed the robustness of the identified hub genes. CONCLUSION: This study provides a new insight into the understanding of the molecular mechanisms associated with TNBC and suggested that the hub genes may serve as prognostic predictors.
Breast cancer is one of the most common malignant tumors in women and a leading cause of cancer-associated deaths worldwide.1 At present, breast cancer is categorized into four intrinsic molecular subtypes clinically, based on the presence of estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) and ki67.2 Triple negative breast cancer (TNBC) represents 10% to 20% of all breast cancer cases, and is characterized by the lack of ER and PR expression and the absence of HER2 amplification.3 Compared with other molecular subtypes of breast cancer, TNBC is more commonly diagnosed in young women and is more prone to relapse and metastasis.4–6 Moreover, TNBC patients cannot benefit from endocrine or HER2 targeted therapy due to the absence of molecular targets, and this leads directly to poor clinical outcome. Therefore, it is necessary to further explore the molecular mechanisms underlying TNBC pathogenesis.During the last decades, the clinical and molecular heterogeneity of breast cancer has been widely recognized. The development and widespread application of high-throughput technologies, including microarray and RNA-sequencing, has provided a novel understanding of the molecular complexity of this disease.7–9 Several studies have demonstrated that TNBC cells express genes characteristic of normal basal/myoepithelial cells, such as KRT5, KRT14, KRT17 and EGFR.10,11 Some molecular features of TNBC have been discovered, including a high frequency of TP53 mutations, aberrant activation of PI3K pathway, inactivation of BRCA1 and BRCA2, RB1 loss, and cyclin E1 amplification.12 However, little is known about the etiological factors which promote the initiation and development of TNBC, and the molecular mechanisms underlying TNBC are far from elucidated.Therefore, we carried out a series of transcriptome analysis using bioinformatics methods, trying to identify differentially expressed genes (DEGs) between TNBC and non-TNBC samples by analyzing several transcriptome microarray datasets from the Gene Expression Omnibus (GEO) database, and investigate their potential biological functions by performing Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. Furthermore, protein–protein interaction (PPI) network was constructed, and six hub genes, which play key roles in TNBC, were screened out. Our study provides a new insight into the understanding of the molecular mechanisms of TNBC, and the identified hub genes may serve as prognostic predictors.
Materials and Methods
Downloading and Processing of Microarray Datasets
Microarray datasets (GSE27447,13 GSE39004,14 GSE4335815 and GSE4582716) were downloaded from GEO (URL: ) database for differential expression analysis. The inclusion criteria for the microarray datasets were as follows: (a) samples containing TNBC and non-TNBC tissues, (b) primary tumors with no adjuvant treatment, (c) the total RNA samples for transcriptome testing, (d) expression profiling by the array as the study type, and (e) Homo sapiens as the organism. The probe names were annotated to Entrez ID according to the corresponding GEO platform files. Detailed information of the four microarray datasets is shown in .
Identification of DEGs
The differential expression analysis, for each of the selected GEO datasets, was performed using R package limma (URL: ), and the robust multi-array average (RMA) method was used to do the background correction. The PAM50 molecular subtyping was performed by using R package genefu (URL: ). Furthermore, to screen out the common DEGs from all these datasets, we employed the robust rank aggregation (RRA) analysis by applying R package RobustRankAggreg (). This algorithm is more credible for DEG identification from different datasets, since it is robust to outliers, noise and errors. Adjusted P-value (Adj.P) <0.05 and |log2 fold-change (FC)| > 1 were used as a threshold.
Functional and Pathway Enrichment Analyses
To further understand the biological functions of these DEGs, GO enrichment analysis of three categories, including biological process (BP), cellular component (CC) and molecular function (MF) were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID; version 6.8, URL: ). KEGG pathway analysis was conducted using the KOBAS database (version 3.0, URL: ). False discovery rate (FDR) <0.05 was considered to be of statistical significance.
Construction of PPI Network and the Identification of Hub Genes
To investigate the interactions of proteins encoded by the identified DEGs, we constructed a PPI network by using Cytoscape (version 3.7.1) software, according to the Search Tool for the Retrieval of Interacting Genes (STRING; version 11.0, URL: ) database. The combined score of 0.7 was considered as a threshold. Thereafter, MCODE (version 1.5, URL: ), a plug-in of Cytoscape, was used to screen significant functional modules from the PPI network, and another plug-in cytohubba (version 0.1, URL: apps.cytoscape.org/apps/cytohubba) was used for hub gene identification. In addition, maximal clique centrality (MCC) method was the algorithm for hub genes identification.
Survival Analysis Using Samples from TCGA Database
To validate the oncogenetic effect of the identified hub genes, we performed survival analyses of TCGA (URL: ) samples by using the Gene Expression Profiling Interactive Analysis (GEPIA; URL: ) online database. TNBC samples were derived into high- and low-expression groups for each hub gene. The median transcript per million (TPM) value was used as a cutoff. To further verify the association between the hub genes and breast cancer outcome, Cox regression analysis was performed in TCGA samples from multiple malignancies, including breast cancer (BRCA), cervical squamous cell carcinoma (CESC), liver hepatocellular carcinoma (LIHC), head and neck squamous cell carcinoma (HNSC), lung adenocarcinoma (LUAD), sarcoma (SARC) and skin cutaneous melanoma (SKCM). P-value <0.05 was considered statistically significant.
Clinical Samples Collection and Storage
Twenty TNBC and 20 non-TNBC clinical samples were collected from patients who were primarily diagnosed with breast cancer between 2013 and 2019 at the Affiliated Hospital of Jiangnan University. All clinical samples were diagnosed by senior pathologists, and the proportions of tumor cells were over 85%. The specimens were immediately transferred into the gas phase of liquid nitrogen after removal, and subsequently stored at −80°C until further processing. All the patients who were enrolled in the present study had signed informed consent.
RNA Extraction and Quantitative Reverse Transcriptional PCR (qRT-PCR)
Total RNA was extracted from clinical samples by using the E.Z.N.A Total RNA Kit I (Omega, USA) following the protocol. Reverse transcriptional PCR was performed to synthesis complementary DNA (cDNA) by using the Prime Script RT Reagent Kit (Takara Bio, Japan). CFX96 Touch Real-Time PCR Detection System (Bio-rad, USA) was employed to perform real-time PCR, and the comparative Ct method was used to measure the expression levels of the hub genes. Glyceraldehyde‐3‐phosphate dehydrogenase (GAPDH) was considered as a normalization control. The sequences of the primers are demonstrated in .
Results
In this study, we performed a multistep bioinformatic analysis to screen key genes of TNBC, and the flow diagram is shown in Figure 1. Firstly, to verify the grouping of TNBC and non-TNBC samples in the four datasets (GSE27447, GSE39004, GSE43358 and GSE45827), we performed PAM50 subtyping, and the heatmaps of each dataset are shown in –. According to the result, most of the TNBC samples were of basal-like subtype, and this confirmed the representativeness of the TNBC datasets. Subsequently, we performed a differential expression analysis on the datasets, which contain a total of 77 TNBC and 186 non-TNBC samples. The distribution of the DEGs in each dataset is shown in Figure 2A–D. Furthermore, the common DEGs among the four datasets were identified by RRA method, including 88 up- and 46 down-regulated genes (Figure 2E). The expression heatmap of top 10 up- and down-regulated DEGs is demonstrated in Figure 2F.
Figure 1
Flow diagram of the bioinformatics analysis in the present study.
Figure 2
Identification of DEGs in four TNBC microarray datasets from GEO.
Notes: (A–D) Volcano plots of differential expression analysis for GSE27447, GSE39004, GSE43358 and GSE45827. (E) Upset plot of DEGs overlapped in the four datasets. (F) Expression heatmap of top 10 up- and down-regulated genes.
Flow diagram of the bioinformatics analysis in the present study.Identification of DEGs in four TNBC microarray datasets from GEO.Notes: (A–D) Volcano plots of differential expression analysis for GSE27447, GSE39004, GSE43358 and GSE45827. (E) Upset plot of DEGs overlapped in the four datasets. (F) Expression heatmap of top 10 up- and down-regulated genes.Abbreviations: TNBC, triple negative breast cancer; GEO, Genome Expression Omnibus; DEGs, differentially expressed genes.
GO and KEGG Pathway Enrichment Analysis
GO and KEGG pathway enrichment analyses were performed to further explore the biological functions of the identified DEGs. The biological process category of the GO analysis results showed that up-regulated DEGs were significantly enriched in phosphatidylinositol 3-kinase signaling, mammary gland alveolus development and cell adhesion. For the cellular component category, these DEGs were enriched in the extracellular matrix, basement membrane and extracellular exosome. Moreover, the DEGs were significantly enriched in platelet-derived growth factor binding and extracellular matrix structural constituent under the molecular function category (Figure 3A). In addition, the most significantly enriched GO terms for down-regulated genes were cell adhesion and extracellular matrix organization (BP); plasma membrane and integrin complex (CC); T cell receptor binding and cell adhesion molecule binding (MF; Figure 3B).
Figure 3
GO and KEGG pathway enrichment analyses for DEGs.
Notes: (A) Chord plot of GO enrichment analysis of up-regulated genes. (B) Chord plot of GO enrichment analysis of down-regulated genes. (C) Dot plot of KEGG enrichment analysis of up-regulated genes. (D) Dot plot of KEGG enrichment analysis of down-regulated genes.
Abbreviations: GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.
GO and KEGG pathway enrichment analyses for DEGs.Notes: (A) Chord plot of GO enrichment analysis of up-regulated genes. (B) Chord plot of GO enrichment analysis of down-regulated genes. (C) Dot plot of KEGG enrichment analysis of up-regulated genes. (D) Dot plot of KEGG enrichment analysis of down-regulated genes.Abbreviations: GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.According to KEGG pathway enrichment analysis, the up-regulated DEGs were significantly enriched in focal adhesion, extracellular cell matrix (ECM)-receptor interaction and PI3K-AKT signaling pathway (Figure 3C). Downregulated DEGs were enriched in cell adhesion molecules (CAMs) and T cell receptor signaling pathway (Figure 3D).
PPI Network Construction and Hub Gene Identification
To investigate the interactions between proteins encoded by the DEGs, a PPI network was constructed according to STRING database, which involved 59 nodes and 142 edges (Figure 4A). Thereafter, A plug-in of Cytoscape, MCODE was employed to conduct clustering analysis. Four functional modules were screened out from the PPI network (Figure 4B–E). To further understand the potential biological functions of these modules, we performed another KEGG pathway enrichment analyses. The results demonstrated that these modules play roles in cell adhesion, ECM–receptor interaction, pathways in cancer and T-cell immunity (Table 1).
Figure 4
PPI network construction and hub genes identification.
Table 1
KEGG Enrichment Analysis of Top four Modules Identified from PPI Network
Module
KEGG Entry
Description
Count
FDR
Module 1
hsa04660
T cell receptor signaling pathway
8
2.06E-13
hsa04664
Fc epsilon RI signaling pathway
2
0.001979
Module 2
hsa04974
Protein digestion and absorption
6
1.51E-11
hsa04512
ECM-receptor interaction
6
1.51E-11
hsa04510
Focal adhesion
6
1.89E-09
Module 3
hsa04512
ECM-receptor interaction
4
2.03E-07
hsa04510
Focal adhesion
4
4.12E-06
hsa04514
Cell adhesion molecules (CAMs)
3
0.000126
Module 4
hsa05200
Pathways in cancer
3
0.006485
hsa04012
ErbB signaling pathway
2
0.007474
KEGG Enrichment Analysis of Top four Modules Identified from PPI NetworkPPI network construction and hub genes identification.Subsequently, another plug-in of Cytoscape, cytohubba was used for hub genes identification. From the whole PPI network, six hub genes (CD3D, CD3E, CD3G, FYN, GRAP2 and ITK) were screened out according to the ranking by MMC method. The sub-network of the identified hub genes is shown in Figure 4F.
TCGA Survival Analysis and qRT-PCR Validation
To validate the association between identified hub genes expression and patients’ clinical outcomes, 135 TNBC samples from TCGA database were derived into high- and low-expression groups according to the median expression level of each hub gene, respectively. The online tool GEPIA2 was employed to conduct the survival analysis. Among the six hub genes, CD3D, CD3E, GRAP2 and ITK were significantly associated with the overall survival of TNBC (log-rank P < 0.05; Figure 5A–D). Furthermore, a survival map of multiple malignancies is shown in Figure 5E. According to the result, the identified hub genes tended to increase the risk of tumor diseases.
Figure 5
Hub genes validation using TCGA data and clinical specimens.
Notes: (A) Kaplan–Meier survival analysis of CD3D. (B) Kaplan–Meier survival analysis of CD3E. (C) Kaplan–Meier survival analysis of GRAP2. (D) Kaplan–Meier survival analysis of ITK. (E) Survival map of the hub genes in multiple malignancies. (F) Expression levels of the hub genes in clinical specimens detected by qRT-PCR.
Abbreviations: TCGA, The Cancer Genome Atlas; qRT-PCR, quantitative reverse transcriptional polymerase chain reaction.
Hub genes validation using TCGA data and clinical specimens.Notes: (A) Kaplan–Meier survival analysis of CD3D. (B) Kaplan–Meier survival analysis of CD3E. (C) Kaplan–Meier survival analysis of GRAP2. (D) Kaplan–Meier survival analysis of ITK. (E) Survival map of the hub genes in multiple malignancies. (F) Expression levels of the hub genes in clinical specimens detected by qRT-PCR.Abbreviations: TCGA, The Cancer Genome Atlas; qRT-PCR, quantitative reverse transcriptional polymerase chain reaction.Finally, we validated the expression of the hub genes in clinical samples. As demonstrated in Figure 5F, the relative expression level of CD3D, CD3E, CD3G, FYN, GRAP2 and ITK in TNBC samples were significantly lower than in non-TNBC samples (P < 0.05), and this result was consistent with the differential expression analysis.
Discussion
In the present study, four TNBC datasets were downloaded from GEO, and a total of 134 DEGs (88 up- and 46 down-regulated genes) were identified. The results of GO and KEGG enrichment analyses showed that the DEGs were associated with various cancer-related functions and pathways, such as cell adhesion, ECM–receptor interaction, PI3K-AKT signaling pathway and T-cell immunity. Thereafter, PPI network of the identified DEGs was constructed based on STRING database, and six hub genes were screened out from it using MCC method. Survival analysis and qRT-PCR validation further supported the robustness of the above results.Due to the development and popularization of the next-generation sequencing technologies, multiple transcriptomics studies of TNBC have been carried out. For instance, in a study based on the original TNBC dataset GSE76275 from GEO,17 207 DEGs were identified, and SOX8, AR, C9orf152, NRK and RAB30 were proposed as hub genes. Another bioinformatics analysis on TNBC transcriptome involved three GEO datasets (GSE38959, GSE45827, and GSE65194), and CCND1 was considered as the potential key gene associated with TNBC prognosis.18 Comparing with these published similar studies, the present study has some advantages: First, we employed the RRA method to analyze the overlapping DEGs from the four datasets. To the best of our knowledge, this algorithm has not been used for TNBC vs non-TNBC transcriptomic expression analysis. The RRA method is parameter-free and robust to outliers, noise and errors. Secondly, the present study involved 77 TNBC and 186 non-TNBC samples from four microarray datasets, and this sample size is larger than most of the other similar studies. These features ensured the credibility of our results.In the present study, we screened out six hub genes from the whole PPI network. According to the existing studies, the hub genes play key roles in multiple cancer-related biological processes. CD3D, CD3E and CD3G are located in the same cluster on chromosome 11, and encode a group of polypeptide named CD3-delta, CD3-gamma and CD3 epsilon. These polypeptides, along with the T-cell receptor alpha/beta and gamma/delta heterodimers, form the T-cell receptor/CD3 complex (TCR/CD3 complex) which plays an important role in coupling antigen recognition to several intracellular signalling pathways.19,20 It was reported that CD3D, CD3E and CD3G were associated with improved OS in five cancer types, including breast cancer.21 According to the survival analysis of TCGA data, this is consistent with our result. Another immune-associated hub gene, ITK, encodes an essential intracellular tyrosine kinase expressed in T-cells, which is thought to play a role in T-cell proliferation and differentiation.22,23 Research of ibrutinib, an inhibitor of both ITK and BTK, demonstrated that inhibiting ITK could shift the balance between Th1 and Th2 T cells and potentially enhance antitumor immune responses.24 However, the effect of ITK on breast cancer has not been elucidated so far.GRAP2 locates in chromosome 22q13.1, and the protein it encodes is an adaptor-like protein involved in leukocyte-specific protein-tyrosine kinase signaling.25
FYN encodes a membrane-associated tyrosine kinase that has been implicated in the control of cell growth, and it has been recognized as an oncogene in various tumor diseases, including melanoma, glioblastoma, squamous cell carcinoma, prostate and breast cancers.26–28 Interestingly, according to the survival and expression analysis performed in our study, we observed that FYN was down-regulated in TNBC comparing with non-TNBC, while the expression of FYN was not associated with TNBC prognosis. Further study is still required to validate the bio-functions of the hub genes in vitro and in vivo.Our study identified six hub genes and various signalling pathways that were strongly associated with T-cell immunity. Accumulating evidence has demonstrated that immuno-biomarkers are of prognostic significance in many tumor types, including breast cancer.29 Unlike other intrinsic subtypes of breast cancer, TNBC is characterized by a higher degree of tumor-infiltrating lymphocytes (TILs),30 and multiple series of investigations have shown that higher TILs are associated with improved overall survival and higher response rate.31–34 In combination with the existing studies, our findings further indicated the important role of immune-related factors in TNBC.
Conclusion
The present study has identified several key genes (CD3D, CD3E, CD3G, FYN, GRAP2 and ITK) which might be considered as novel and potential biomarkers of TNBC. Survival analysis and qRT-PCR validation supported the strong and robust association between the hub genes and TNBC. These results may provide a novel understanding of TNBC tumorigenesis, and further study is urgently needed to elucidate the underlying molecular mechanisms.
Authors: Shantanu Banerji; Kristian Cibulskis; Claudia Rangel-Escareno; Kristin K Brown; Scott L Carter; Abbie M Frederick; Michael S Lawrence; Andrey Y Sivachenko; Carrie Sougnez; Lihua Zou; Maria L Cortes; Juan C Fernandez-Lopez; Shouyong Peng; Kristin G Ardlie; Daniel Auclair; Veronica Bautista-Piña; Fujiko Duke; Joshua Francis; Joonil Jung; Antonio Maffuz-Aziz; Robert C Onofrio; Melissa Parkin; Nam H Pho; Valeria Quintanar-Jurado; Alex H Ramos; Rosa Rebollar-Vega; Sergio Rodriguez-Cuevas; Sandra L Romero-Cordoba; Steven E Schumacher; Nicolas Stransky; Kristin M Thompson; Laura Uribe-Figueroa; Jose Baselga; Rameen Beroukhim; Kornelia Polyak; Dennis C Sgroi; Andrea L Richardson; Gerardo Jimenez-Sanchez; Eric S Lander; Stacey B Gabriel; Levi A Garraway; Todd R Golub; Jorge Melendez-Zajgla; Alex Toker; Gad Getz; Alfredo Hidalgo-Miranda; Matthew Meyerson Journal: Nature Date: 2012-06-20 Impact factor: 49.962
Authors: Bruce G Haffty; Qifeng Yang; Michael Reiss; Thomas Kearney; Susan A Higgins; Joanne Weidhaas; Lyndsay Harris; Willam Hait; Deborah Toppmeyer Journal: J Clin Oncol Date: 2006-11-20 Impact factor: 44.544
Authors: Benjamin F Tillman; James M Pauff; Gowri Satyanarayana; Mahsa Talbott; Jeremy L Warner Journal: Eur J Haematol Date: 2018-02-06 Impact factor: 2.997
Authors: Idit Sagiv-Barfi; Holbrook E K Kohrt; Debra K Czerwinski; Patrick P Ng; Betty Y Chang; Ronald Levy Journal: Proc Natl Acad Sci U S A Date: 2015-02-17 Impact factor: 11.205
Authors: L Yang; X Wu; Y Wang; K Zhang; J Wu; Y-C Yuan; X Deng; L Chen; C C H Kim; S Lau; G Somlo; Y Yen Journal: Oncogene Date: 2011-05-02 Impact factor: 9.867
Authors: L Malorni; P B Shetty; C De Angelis; S Hilsenbeck; M F Rimawi; R Elledge; C K Osborne; S De Placido; G Arpino Journal: Breast Cancer Res Treat Date: 2012-11-04 Impact factor: 4.872