Haiyan Gao1, Mei Yang1, Xiaolan Zhang1. 1. Department of Breast Surgery, Changzhou No. 2 People's Hospital, Affiliated to Nanjing Medical University, Changzhou, Jiangsu 213000, P.R. China.
Abstract
The present study aimed to investigate potential recurrence-risk biomarkers based on significant pathways for Luminal A breast cancer through gene expression profile analysis. Initially, the gene expression profiles of Luminal A breast cancer patients were downloaded from The Cancer Genome Atlas database. The differentially expressed genes (DEGs) were identified using a Limma package and the hierarchical clustering analysis was conducted for the DEGs. In addition, the functional pathways were screened using Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses and rank ratio calculation. The multigene prognostic assay was exploited based on the statistically significant pathways and its prognostic function was tested using train set and verified using the gene expression data and survival data of Luminal A breast cancer patients downloaded from the Gene Expression Omnibus. A total of 300 DEGs were identified between good and poor outcome groups, including 176 upregulated genes and 124 downregulated genes. The DEGs may be used to effectively distinguish Luminal A samples with different prognoses verified by hierarchical clustering analysis. There were 9 pathways screened as significant pathways and a total of 18 DEGs involved in these 9 pathways were identified as prognostic biomarkers. According to the survival analysis and receiver operating characteristic curve, the obtained 18-gene prognostic assay exhibited good prognostic function with high sensitivity and specificity to both the train and test samples. In conclusion the 18-gene prognostic assay including the key genes, transcription factor 7-like 2, anterior parietal cortex and lymphocyte enhancer factor-1 may provide a new method for predicting outcomes and may be conducive to the promotion of precision medicine for Luminal A breast cancer.
The present study aimed to investigate potential recurrence-risk biomarkers based on significant pathways for Luminal A breast cancer through gene expression profile analysis. Initially, the gene expression profiles of Luminal A breast cancerpatients were downloaded from The Cancer Genome Atlas database. The differentially expressed genes (DEGs) were identified using a Limma package and the hierarchical clustering analysis was conducted for the DEGs. In addition, the functional pathways were screened using Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses and rank ratio calculation. The multigene prognostic assay was exploited based on the statistically significant pathways and its prognostic function was tested using train set and verified using the gene expression data and survival data of Luminal A breast cancerpatients downloaded from the Gene Expression Omnibus. A total of 300 DEGs were identified between good and poor outcome groups, including 176 upregulated genes and 124 downregulated genes. The DEGs may be used to effectively distinguish Luminal A samples with different prognoses verified by hierarchical clustering analysis. There were 9 pathways screened as significant pathways and a total of 18 DEGs involved in these 9 pathways were identified as prognostic biomarkers. According to the survival analysis and receiver operating characteristic curve, the obtained 18-gene prognostic assay exhibited good prognostic function with high sensitivity and specificity to both the train and test samples. In conclusion the 18-gene prognostic assay including the key genes, transcription factor 7-like 2, anterior parietal cortex and lymphocyte enhancer factor-1 may provide a new method for predicting outcomes and may be conducive to the promotion of precision medicine for Luminal A breast cancer.
Entities:
Keywords:
Luminal A breast cancer; differentially expressed genes; multigene prognostic assay; significant pathways
Breast cancer is the most commonly diagnosed cancer in women and approximately accounts for 29% of all new cancers in women, and it is the leading cause of cancer-related death in women worldwide (1). Breast cancer is considered as a heterogeneous disease but not a single disease at molecular and clinical levels (2,3). The well-known characteristics of breast cancer-associated factors include pathologic and clinical characteristics of the primary tumor, tumor histology, axillary lymph node (ALN) status, estrogen receptor (ER) content, progesterone receptor (PR) content, content, tumorHER2 status, detectable metastatic disease, patient age, patient comorbid conditions, and menopausal status (3,4). Based on the determination of ER, PR, HER2, and Ki-67, breast cancer is often to be accepted as four subtypes, namely Luminal A, Luminal B, Erb-B2 overexpression and Basal-like (also known as Triple negative breast cancer) according to St Gallen (3). Luminal A (ER positive, PR positive, HER2/neu negative) is the most common subtype, accounting for more than 50% of all breast cancerpatients (5,6).Multigene predictors have been introduced by various technologies, including immunohistochemistry (IHC), reverse transcription-quantitative polymerase chain reaction (RT-qPCR), fluorescence in situ hybridization (FISH) and genomic microarrays (7). Nowadays, some microarray-based multigene predictors have been developed as predictors of response to hormonal therapy (8,9), predictors of response to multiagent cytotoxic chemotherapy (10–13) and independent prognostic biomarkers (14–16).Studies have shown that the recurrence score based on a 21-gene assay is a recurrence predictor for breast cancerpatients receiving adjuvant endocrine therapy (17–19). Recurrence score is an independent predict factor for the response to adjuvant chemotherapy (20,21). Patients with high scores could benefit from adjuvant treatments, whereas those with low scores could not regardless of the pathologic and clinical characteristics. In ATAC trial (22), the risk of recurrence score obtained using a 50-gene assay was seen to have an obvious relationship with the 10 years distant recurrence risk in postmenopausal breast cancerwomen treated with tamoxifen or aromatase inhibitors. In addition, a first commercialized microarray-based multigene assay containing 70 genes, primarily associated with proliferation, metastasis, invasion, stromal integrity, and angiogenesis, is approved by the FDA's new in diagnostic multivariate index assay classification (7).Though multigene predictors have been widely investigated and used for the breast cancer, there are still rare studies focusing on the prognosis of subtypes. According to retrospective analyses and authoritative guidelines, these subtypes are associated with different relapse-free survival and overall survival and the patients with different subtypes should be administrated with different systemic treatment strategies (3). In this study, we aimed to utilize microarray profiling to investigate potential biomarkers that are differentially expressed in women with Luminal A-like breast cancer based on significant pathways analysis through gene expression profiles analysis. To validate the ability of the candidate multigene assay for the prediction of clinical outcomes, the gene expression data and survival data of Luminal A breast cancerpatients were downloaded from Gene Expression Omnibus for analysis.
Materials and methods
Gene expression data
The gene expression profiles of breast cancerpatients were downloaded from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/) database with the deadline of December 27, 2016, including 20,501 genes obtained from 1,160 samples (1,041 tumor tissue samples, 112 normal tissue samples and 7 peripheral blood samples). According to the clinical information of ER, PR and HER2 information (23), 370 Luminal A breast cancer samples were screened.
Data processing and differentially expressed genes (DEGs) identifying
The expression profile data of Luminal A patients were normalized. Z-score correction method was utilized to rule out the difference at gene expression level (24). A total of 249 Luminal A samples from alive patients were assigned as good outcome group; whereas a total of 47 Luminal A samples from dead patients were assigned as poor outcome group. P<0.01 and |log2 Fold-Change (FC)| >1 were regarded as the cut-off criteria to screen out DEGs between the good and poor outcome groups using LIMMA package (25).
Hierarchical clustering analysis
Hierarchical clustering analysis was conducted for the DEGs using heatmap2 package in R language (26) and the result was visualized using the form of heatmap.
Identifying statistically significant pathways
The pathway information were download from Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.kegg.jp/kegg/pathway.html) database on March 1, 2017. The KEGG pathway enrichment analyses were performed based on pathway feature vector calculation (27) and nearest shrunken centroids (28). Briefly, the KEGG pathway was scored using the expression values of the DEGs in all samples. The sample was projected by taking the upregulated score and downregulated score as coordinates. The accuracy of the good group and the poor group was evaluated by calculating the geometric center of the same sample and specifying the radius (27). The pathways with accuracy more than 80% in these two groups were screened. The statistically significant pathways were recognized by calculating the Ratio of rank and converting to P-value according to the random 10,000 times perturbation of the background library (train set samples) (28).
Identifying prognostic biomarkers and training
Survival analysis for Luminal A breast cancer samples in TCGA were performed using DEGs involved in the obtained significant pathways (29). The support vector machine (SVM) classification model was constructed using these DEGs. Meanwhile, the model was trained using the Luminal A samples and the receiver operating characteristic (ROC) curve was drawn.
Verification of multigene prognostic assay
The reliability and repeatability of the multigene assay were verified using the gene expression profiles of GSE2034 (https://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE2034) and the survival data of Luminal A breast cancerpatients was downloaded from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) database. Besides, the SVM model was also utilized to test the multigene assay, and the accuracy of the model was analyzed using the ROC curve.
Results
DEGs for Luminal A breast cancer with good and poor outcome groups
Using LIMMA package, a total of 300 DEGs were identified between 249 samples from good outcome group and 47 samples from poor outcome group, including 176 upregulated genes and 124 downregulated genes (Fig. 1). It can be observed from the figure that the data are homogenized to eliminate the deviation and the deviation scores of most genes were concentrated in −1 to 1. The genes distributed in the two branches were the most significant DEGs.
Figure 1.
The volcanic map of 300 DEGs. The abscissa represents log2 FC and the ordinate represents the negative logarithm of P-value. DEG, differentially expressed genes; FC, fold-change.
Clustering of DEGs
The 300 DEGs identified between good and poor outcome groups were selected for hierarchical clustering analysis. As presented in Fig. 2, 68 of 74 dead samples were clustered together and only 6 samples were clustered to the alive group with a precision of 92%. Meanwhile, 248 of 249 alive samples were clustered together and only 1 sample was clustered to the dead group with a precision of 99.6%. This result indicated that the DEGs could be used to effectively distinguish Luminal A samples with different prognoses.
Figure 2.
The heat map of 300 DEGs in the dead and alive samples. The abscissa represents the expression value of 300 DEGs in all samples and the ordinate represents the two sets of samples. Blue represents high expression, and yellow represents low expression. Meanwhile, the dead and alive samples were marked with red and green labels, respectively. DEG, differentially expressed genes.
Statistically significant pathways
Total 9 pathways with accuracy more than 80% in the two groups were screened. The significance analysis for these 9 pathways was conducted using Nearest Shrunken Centroids and the results are shown in Table I. It was observed that all of the 9 pathways significantly distinguished cancer samples with good outcome from cancer samples with poor outcome (P<0.05).
Table I.
Nine significant pathways according to Kyoto Encyclopedia of Genes and Genomes analysis nearest shrunken centroids.
Pathway
Initial precision
Average value
P-value
Colorectal cancer
0.7814
0.5775
1.27E-12
Oocyte meiosis
0.8050
0.6291
9.24E-09
Wnt signaling pathway
0.7033
0.5088
3.38E-06
Terpenoid backbone biosynthesis
0.6268
0.4298
5.80E-04
Endometrial cancer
0.5582
0.4769
7.90E-04
Cell cycle
0.5843
0.4648
2.81E-03
Proteasome
0.5422
0.4178
1.13E-02
Basal cell carcinoma
0.5715
0.4104
1.50E-02
Small cell lung cancer
0.5567
0.3995
1.72E-02
The average value column represents the average score of 100 iterations.
Identifying prognostic biomarkers based on the significant pathways
The DEGs involved in these 9 significant pathways were collected and a total of 18 DEGs were identified as prognostic biomarkers (Table II). Three genes [transcription factor 7-like 2 (TCF7L2), anterior parietal cortex (APC), and lymphocyte enhancer factor-1 (LEF1)], were involved in four pathways, one gene [cyclin E1 (CCNE1)] in three pathways, four [S-phase kinase-associated protein 2 (SKP2), humanfrizzled-7 (FZD7), polo-like kinase 1 (PLK1), and B cell lymphoma 2 (BCL2)] in two pathways and ten [proteasome activator subunit 4 (PSME4), prenyldiphosphate synthase, subunit 1 (PDSS1), promoters for humanDNA-PK cs (PRKDC), TTK protein kinase (TTK), minichromosome maintenance deficient 4 (MCM4), progesterone receptor (PGR), proteasome subunit alpha 7 (PSMA7), MDM2 proto-oncogene (MDM2), laminin subunit beta 2 (LAMB2), and proteasome 26S subunit, non-ATPase 7 (PSMD7)] in one pathway. Meanwhile, the annotation results for the 18 genes based on the TCGA database are shown in Fig. 3A. Simultaneously, the heat map was shown for the changes of the 18 DEGs in TCGA breast cancer samples (Fig. 3B). It can be seen that the expression of these 18 DEGs in TCGA breast cancer samples are almost upregulated.
Table II.
DEGs identified as prognostic biomarkers based on the significant pathways.
Wnt signaling pathway, colorectal cancer, endometrial cancer, B cell carcinoma
4
CCNE1
Cell cycle, oocyte meiosis, small cell lung cancer
3
SKP2
Cell cycle, small cell lung cancer
2
FZD7
Wnt signaling pathway, basal cell carcinoma
2
PLK1
Cell cycle, oocyte meiosis
2
BCL2
Colorectal cancer, small cell lung cancer
2
PSME4
Proteasome
1
PDSS1
Terpenoid backbone biosynthesis
1
PRKDC
Cell cycle
1
TTK
Cell cycle
1
MCM4
Cell cycle
1
PGR
Oocyte meiosis
1
PSMA7
Proteasome
1
MDM2
Cell cycle
1
LAMB2
Small cell lung cancer
1
PSMD7
Proteasome
1
Counts stand for the number of significant pathways. TCF7L2, transcription factor 7-like 2; APC, anterior parietal cortex; LEF1, lymphocyte enhancer factor-1; CCNE1, cyclin E1; SKP2, S-phase kinase-associated protein 2; FZD7, human frizzled-7; PLK1, polo-like kinase 1; BCL2, B cell lymphoma 2; PSME4, proteasome activator subunit 4; PDSS1, prenyldiphosphate synthase, subunit 1; PRKDC, promoters for human DNA-PK cs; TTK, TTK protein kinase; MCM4, minichromosome maintenance deficient 4; PGR, progesterone receptor; PSMA7, proteasome subunit α7; MDM2, MDM2 proto-oncogene; LAMB2, laminin subunit β2; PSMD7, proteasome 26S subunit, non-ATPase 7.
Figure 3.
The annotation results and the heat map of 18 biomarkers in Luminal A breast cancer samples from the TCGA database. (A) The annotation results of 18 biomarkers in the TCGA database. Red represents upregulation and blue represents downregulation. The proportion of each gene mutation is also marked. (B) The heat map for the changes of the 18 genes in TCGA breast cancer samples. Red represents upregulation and blue represents downregulation. TCGA, The Cancer Genome Atlas; TCF7L2, transcription factor 7-like 2; APC, anterior parietal cortex; LEF1, lymphocyte enhancer factor-1; CCNE1, cyclin E1; SKP2, S-phase kinase-associated protein 2; FZD7, human frizzled-7; PLK1, polo-like kinase 1; BCL2, B cell lymphoma 2; PSME4, proteasome activator subunit 4; PDSS1, prenyldiphosphate synthase, subunit 1; PRKDC, promoters for human DNA-PK cs; TTK, TTK protein kinase; MCM4, minichromosome maintenance deficient 4; PGR, progesterone receptor; PSMA7, proteasome subunit α7; MDM2, MDM2 proto-oncogene; LAMB2, laminin subunit β2; PSMD7, proteasome 26S subunit, non-ATPase 7.
Survival analysis for the 18 DEGs in train set
The survival analysis was conducted for the Luminal A breast patient samples with abnormal expression of these 18 genes and samples with normal expression of these 18 genes in TCGA database. As a result, the samples with abnormal expression of these 18 genes showed a significantly lower survival rate than samples with normal expression of these 18 genes (Fig. 4A, P=0.0319).
Figure 4.
Survival analysis for the 18 biomarkers in train set obtained from TCGA database. (A) Survival curve for two sets of samples from TCGA database using the 18 biomarkers. The red curve represents the samples with differently expressed biomarkers, and the blue curve represents the samples with normally expressed biomarkers. (B) ROC curve for 18 biomarkers in train set. The abscissa represents sensitivity and the ordinate represents specificity. ROC, receiver operating characteristic; TGCA, The Cancer Genome Atlas; TPR, true positive rate; FPR, false positive rate; AUC, area under the curve.
Additionally, the average AUC (area under curve) was 0.871 for the ROC of these 18 genes calculated from the random train set in TCGA database. The sensitivity was 0.913 and the specificity was 0.88 (Fig. 4B).
The verification of multigene prognostic assay in test set
The multigene assay was verified using the Luminal A breast patient samples with abnormal expression of these 18 genes and samples with normal expression of these 18 genes obtained from GSE2034 database. As a result, the samples with abnormal expression of these 18 genes showed a significantly lower survival rate than samples with normal expression of these 18 genes (Fig. 5A, P=0.0279). According to the ROC for the test set, the average AUC was 0.793 with sensitivity of 0.832 and specificity of 0.779 (Fig. 5B).
Figure 5.
Survival analysis for the 18 biomarkers in test set obtained from Gene Expression Omnibus database. (A) Survival curve for the samples from the gene expression profiles of GSE2034 using the 18 biomarkers. The red curve represents samples in high risk group with differently expressed biomarkers, and the blue curve represents samples in low risk group with normally expressed biomarkers. (B) ROC curve for 18 biomarkers in test set. The abscissa represents sensitivity and the ordinate represents specificity. ROC, receiver operating characteristic; TPR, true positive rate; FPR, false positive rate; AUC, area under the curve.
Discussion
With the promotion and put forward of precision medical, studies focusing on special subtypes of breast cancer are particularly meaningful. Regardless of the development of the multigene prognostic assay for breast cancer, our study still has a critical necessary for the prognostic study of breast cancer by focusing on the Luminal A subtype. According to our results, a total of 300 DEGs were identified between good prognosis group and poor prognosis group, including 176 upregulated genes and 124 downregulated genes. Based on the hierarchical clustering analysis, these DEGs could clearly distinguish the samples of the two groups. Meanwhile, the 18 genes predictors were involved in 9 significant pathways, including cancer-related pathways (colorectal cancer, endometrial cancer, basal cell carcinoma and small cell lung cancer), oocyte meiosis, Wnt signaling pathway, Terpenoid backbone biosynthesis, cell cycle and proteasome were selected. According to the survival analysis and ROC curve, the obtained 18-gene prognostic assay exhibited a good prognostic function with high sensitivity and specificity for the train set samples and verification set samples.TCF7L2 was identified as a key gene in the multigene prognostic assay and it was involved in four significant pathways, namely Wnt signaling pathway, colorectal cancer, endometrial cancer and basal cell carcinoma, based on the pathway analysis. TCF7L2, located on chromosome 10q25.2, plays a critical role in cancer cell growth, apoptosis, invasion and metastasis by regulating Wnt signalling (30,31). The regulation roles of TCF7L2 gene and related Wnt signaling pathway in breast cancer and its special subtypes have been widely confirmed (32–34). Several studies have exploited the association between the gene polymorphisms of TCF7L2 and the risk of breast cancer. Naidu et al (34) and Burwinkel et al (35) reported that TCF7L2 variants induced an increased breast cancer risk, and might be a potential candidate for the susceptibility of breast cancer. Additionally, the TCF7L2 protein is involved in the homeostasis of blood glucose and the gene polymorphisms of TCF7L2 are identified to increase the risk of type 2 diabetes (36). Diabetes have been reported to be associated with the increased risk of breast cancer and the similar results were seen in Luminal A and B subtypes (37). Consistent with these previous studies, TCF7L2 gene was screened as a key DEG between patients with good outcomes and poor outcomes in Luminal A breast cancer.In addition to TCF7L2, APC and LEF1 are also involved in the above mentioned four significant pathways. The association between APC and the prognosis of breast cancer has also been reported. In a study conducted by Müller et al, the methylated APC DNA indicated the worst prognosis in breast cancer samples from the train set and the independent test set (P<0.001) and it was considered as an potentially independent prognostic factor for breast cancer with poor outcomes (38). The prognostic importance of APC was also been confirmed by a study which discovered that the deletion of APC was associated with a poor overall survival of breast cancerpatients (39). According to the hierarchical analysis, the alterations of APC were significantly higher in ER-/PR-breast cancer compared with ER+/PR+ breast cancer samples (39). In general, patients with ER-/PR-have a worse outcome than patients with ER+/PR+. Thus, it is reasonable to speculate that the abnormal expression of APC might be associated poor outcome in Luminal A (ER+/PR+) breast cancer. LEF1 has been widely reported to promote cancer cell metastasis, mediate chemotactic invasion, and is associated with cancer progression (40). The LEF1 overexpression has been identified as a prognostic factor for poor outcome and increased risk of liver metastasis in primary colorectal cancer (41). Delaunay et al reported that the depletion of LEF1 strongly decreased the chemotactic potential of breast cancer cells and the expression level of LEF1 was associated with the risk of developing metastasis in breast cancerpatients (42). As expected, the expression of LEF1 was significantly different between good and poor outcome groups and it was screened as a key DEG according to our analysis.The 18-gene prognostic assay including the three key genes, TCF7L2, APC and LEF1, was also demonstrated with an accurate ability to distinguish good outcomes and poor outcomes in Luminal A breast cancer. To meet the background of the precision treatment of our study would enrich the research field of specific multi-gene prognosis for breast cancer subtypes. Further study with large samples should be conducted to verify the prognostic value of this 18-gene prognostic assay and prospective study is also needed.By conducting survival analysis, the 18-multigene assay showed effective distinguish effect on patients with different prognosis status in the low risk and high risk groups. However, the prognosis in these two groups was extraordinarily poor. The 18-gene prognostic assay should be verified in more Luminal A breast cancer samples with more typical prognosis status in further experiment.
Authors: Reshma Jagsi; Rita Abi Raad; Saveli Goldberg; Timothy Sullivan; James Michaelson; Simon N Powell; Alphonse G Taghian Journal: Int J Radiat Oncol Biol Phys Date: 2005-07-15 Impact factor: 7.038
Authors: Christine Desmedt; Fanny Piette; Sherene Loi; Yixin Wang; Françoise Lallemand; Benjamin Haibe-Kains; Giuseppe Viale; Mauro Delorenzi; Yi Zhang; Mahasti Saghatchian d'Assignies; Jonas Bergh; Rosette Lidereau; Paul Ellis; Adrian L Harris; Jan G M Klijn; John A Foekens; Fatima Cardoso; Martine J Piccart; Marc Buyse; Christos Sotiriou Journal: Clin Cancer Res Date: 2007-06-01 Impact factor: 12.531
Authors: Gong Tang; Steven Shak; Soonmyung Paik; Stewart J Anderson; Joseph P Costantino; Charles E Geyer; Eleftherios P Mamounas; D Lawrence Wickerham; Norman Wolmark Journal: Breast Cancer Res Treat Date: 2011-01-11 Impact factor: 4.872
Authors: Nandini Dey; Benjamin G Barwick; Carlos S Moreno; Maja Ordanic-Kodani; Zhengjia Chen; Gabriella Oprea-Ilies; Weining Tang; Charles Catzavelos; Kimberly F Kerstann; George W Sledge; Mark Abramovitz; Mark Bouzyk; Pradip De; Brian R Leyland-Jones Journal: BMC Cancer Date: 2013-11-10 Impact factor: 4.430
Authors: Sylvain Delaunay; Francesca Rapino; Lars Tharun; Zhaoli Zhou; Lukas Heukamp; Martin Termathe; Kateryna Shostak; Iva Klevernic; Alexandra Florin; Hadrien Desmecht; Christophe J Desmet; Laurent Nguyen; Sebastian A Leidel; Anne E Willis; Reinhard Büttner; Alain Chariot; Pierre Close Journal: J Exp Med Date: 2016-10-10 Impact factor: 14.307
Authors: Mitch Dowsett; Ivana Sestak; Elena Lopez-Knowles; Kalvinder Sidhu; Anita K Dunbier; J Wayne Cowens; Sean Ferree; James Storhoff; Carl Schaper; Jack Cuzick Journal: J Clin Oncol Date: 2013-07-01 Impact factor: 44.544