Literature DB >> 32282693

A 17-gene expression-based prognostic signature associated with the prognosis of patients with breast cancer: A STROBE-compliant study.

Jin-Xian Qian1, Min Yu2, Zhe Sun1, Ai-Mei Jiang1, Bo Long3.   

Abstract

Identification of reliable predictive biomarkers for patients with breast cancer (BC).Univariate Cox proportional hazards regression model was conducted to identify genes correlated with the overall survival (OS) of patients in the TCGA-BRCA cohort. Functional enrichment analysis was conducted to investigate the biological meaning of these survival related genes. Then, patients in TCGA-BCRA were randomly divided into training set and test. Least absolute shrinkage and selection operator (LASSO) penalized Cox regression model was performed and the risk score of BC patients in this model was used to build a prognostic signature. The prognostic performance of the signature was evaluated in the training set, test set, and an independent validation set GSE7390.2519 genes were demonstrated to be significantly associated with the OS of BC patients. Functional annotation of the 2519 genes suggested that these genes were associated with immune response and protein synthesis related gene ontology terms and pathways. 17 genes were identified in the LASSO Cox regression model and used to construct a 17-gene signature. Patients in the 17-gene signature low risk group have better OS and event-free survival compared with those in the 17-gene signature high risk group in the TCGA-BRCA cohort. The prognostic role of the 17-gene signature has been confirmed in the validation cohort. Multivariable Cox proportional hazards regression model suggested the 17-gene signature was an independent prognostic factor in BC.The 17-gene signature we developed could successfully classify patients into high- and low-risk groups, indicating that it might serve as candidate biomarker in BC.

Entities:  

Mesh:

Year:  2020        PMID: 32282693      PMCID: PMC7220332          DOI: 10.1097/MD.0000000000019255

Source DB:  PubMed          Journal:  Medicine (Baltimore)        ISSN: 0025-7974            Impact factor:   1.817


Introduction

Breast cancer (BC), the second most frequent malignance in females with an estimated 1,676,000 newly diagnosed cases annually, represents a common presentation worldwide but is of special concern in developing countries with limited screening policies. Radical mastectomy is considered frontline treatment for early-stage BC patients; Adjuvant treatment (chemotherapy, radiation therapy, hormone therapy, biotherapy, etc) of BC is designed to treat micrometastatic disease. Owning to the fact that many patients are diagnosed with advanced BC, which makes them lose the chance of surgical treatment and the disease remains incurable. Currently, conventional histopathological test is considered the most common and reliable method for the prognosis prediction and treatment decision in patients with BC. However, it often fails to explain the diversity of genetic burden experienced by the independent individuals and to classify patients into different risk groups. With the development of high-throughput sequencing, several molecular, including SPAG9, neogenin, thioredoxin1 (Trx1), and AGR3, have been severed as biomarkers that are correlated with clinical stage, diagnostic subtype, and prognosis of patients with BC. Nevertheless, few biomarkers have been effectively applied to clinical settings. Least absolute shrinkage and selection operator (LASSO) is a regression analysis method for simultaneous feature selection and regularization, which has been used to screen a variety of tumor-associated biomarkers. Herein, we retrospectively analyzed the gene expression profiles of 1080 BC patients and developed a 17-gene based signature, demonstrating that the 17-gene signature might be a candidate prognostication factor in patients with BC.

Methods and materials

BC gene expression studies

BC gene expression profile was obtained from TCGA-BRCA cohort and GSE7390. In the TCGA-BRCA cohort, we included primary BC patients with survival information documented, and we treated this BC cohort as training set and test set. GSE7390, an Affymetrix Human Genome U133A Array, included 198 gene expression profiles of 198 patients and was treated as an independent validation cohort in this study. Ethical approval is not necessary for this study.

Model training and signature construction

Univariate Cox proportional hazards regression model (CoxPH) was conducted to identify genes relating with the overall survival (OS) of BC patients in the TCGA-BRCA cohort, and then the TCGA-BRCA cohort was randomly classified into tow cohorts (training set and test set) in a 2:3 ratio. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted to figure out the biological significance of these survival related genes using DAVID (v6.8). Subsequently, we performed LASSO penalized CoxPH model in the training set by using the R package “glmnet.” Genes with coefficient not shrunk exactly to zero were applied to form a multigene based prognostication signature. The risk score of each BC patients was calculated based on the coefficients of each gene in the LASSO penalized CoxPH model.

Evaluation of the prognostic performance of the 17-gene signature

According to the cutoff value derived from time-dependent receiver operating characteristic curve (ROC) analysis, BC patients were categorized into 2 risk groups (17-gene signature high risk group and the 17-gene signature low risk group). We compared the OS and event-free survival (EFS) of high risk and low risk BC patients in the training set, test set, and validation set (GSE7390) using log-rank test and Kaplan–Meier (KM) curves. The risk score of each BC patients in the validation set was estimated according to the coefficients of the 17 genes in the LASSO penalized CoxPH in the training set. Univariate and multivariable CoxPH models were also conducted.

Gene set enrichment analysis (GSEA)

Finally, to recognize related mechanisms that the 17-gene signature influenced on the survival of BC patients, we conducted GSEA in the training set and test based on the risk score of each BC patients. Patients were assigned to 2 different risk groups as introduced above.

Statistical analysis

All the statistical analyses in the present study were performed using R 3.5.2. For the survival analysis including Kaplan–Meier curve, univariate CoxPH model and multivariable CoxPH model, P values less than .05 were considered statistically significant. For the GSEA analysis, gene sets with normal P value less than .05 and false discovery rate less than 25% were considered significantly enriched.

Results

Characteristics of patients with BC in the 3 experimental cohorts

The TCGA-BRCA cohort included a total of 1248 samples, 1101 of which are BC, and among the 1101 BC samples, the OS information is available. After randomization, 648 cases and 432 cases were categorized into the training set and the test set, respectively (the baseline features of patients in the training set and test set were shown in supplementary Table 1 and supplementary Table 2). The validation set consisted of a total of 199 BC patients (median age [range], 46 [24-60] years) was included in (the baseline features of BC patients in the validation set was summarized in supplementary Table 3).

Construction of the 17-gene signature

A total of 2519 genes were demonstrated to be significantly related with the OS of BC patients using univariate CoxPH model. The KEGG pathway and GO enrichment analysis suggested that the 2519 gene were mostly enriched in immune related pathways (antigen processing and presentation, primary immunodeficiency, allograft rejection, and graft-versus-host disease, Fig. 1A) and GO terms (nuclear-transcribed mRNA catabolic process, SRP-dependent cotranslational protein targeting to membrane, viral transcription, nonsense-mediated decay, translational initiation, translation, inflammatory response, T cell receptor signaling pathway, rRNA processing, adaptive immune response, immune response, interferon-gamma-mediated signaling pathway, regulation of immune response, and T cell costimulation, Fig. 1B), indicating that these genes were mostly related with immune response and protein synthesis. Thus, the 2519 genes were applied in a LASSO penalized CoxPH model, and 17 genes were identified after feature selection in this model. Therefore, we formed a 17-gene signature according to the risk score of each BC patients (supplementary Table 4 and Fig. 2).
Figure 1

The details of the 17-gene prognostic signature. (A) The risk score of breast cancer patients calculated based on the LASSO penalized Cox proportional hazards regression model in the 17-gene signature low risk group and 17-gene signature high risk group. (B) The survival status and time of breast cancer patients in the 17-gene signature low risk group and 17-gene signature high risk group. (C) The expression levels of the 17 genes in the 17-gene signature low risk group and 17-gene signature high risk group.

Figure 2

Functional annotation of the 2519 genes correlated with the overall survival of breast cancer patients. (A) KEGG signaling pathways that the 2519 genes were enriched with. (B) GO terms that the 2519 genes were enriched. The more significant the P value of the GO terms and KEGG pathways, the closer the color of the corresponding bar is to red. The count of the GO terms and KEGG pathways means the number of genes enriched in the corresponding terms.

The details of the 17-gene prognostic signature. (A) The risk score of breast cancer patients calculated based on the LASSO penalized Cox proportional hazards regression model in the 17-gene signature low risk group and 17-gene signature high risk group. (B) The survival status and time of breast cancer patients in the 17-gene signature low risk group and 17-gene signature high risk group. (C) The expression levels of the 17 genes in the 17-gene signature low risk group and 17-gene signature high risk group. Functional annotation of the 2519 genes correlated with the overall survival of breast cancer patients. (A) KEGG signaling pathways that the 2519 genes were enriched with. (B) GO terms that the 2519 genes were enriched. The more significant the P value of the GO terms and KEGG pathways, the closer the color of the corresponding bar is to red. The count of the GO terms and KEGG pathways means the number of genes enriched in the corresponding terms.

The prognostication performance of the 17-gene signature

The results of KM curves indicated that patients in the 17-gene signature low risk group were associated superior OS than those in the 17-gene signature high risk group in the training set (univariate CoxPH, HR = 94.06003, 95% CI: 33.54986–263.7057; log-rank P = 5.67E−18, supplementary Table 5 and Fig. 3A) and test set (univariate CoxPH, HR = 14.71657, 95% CI: 3.273202–66.16684; log-rank P = .000455, supplementary Table 6 and Fig. 3B). For the EFS, patients in the 17-gene signature low risk group were associated with better EFS compared with those in the 17-gene signature high risk group in the training set (univariate Cox regression model, HR = 6.768481, 95% CI: 1.563461–29.30188; log-rank P = .010537, supplementary Table 7 and Fig. 3C) and test set (univariate Cox regression model, HR = 6.052806, 95% CI:1.321835–27.71637; log-rank P = .020374, supplementary Table 8 and Fig. 3D). Furthermore, we analyzed the prediction value of the 17-gene signature in GSE7390. As shown in Figure 4, the OS of patients in the 17-gene signature low risk group were significantly longer as compared to that in the 17-gene signature high risk group (log-rank P = .047). Furthermore, we also divided BC patients in the TCGA-BRCA (including the training set and test set) into triple negative BC (TNBC) and non-TNBC, and investigated the survival relevance of the 17-gene signature of patients in the 2 subgroups. As shown in supplementary Figure 1E–H the 17-gene signature could significantly stratify the both TNBC and non-TNBC patients into different survival groups in the training set and test set. In the validation set GSE7390, the pathological stage and entire statuses of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) were not reported, thus, we could not perform subgroup analysis in GSE7390 the same with in the training set and test. However, the ER status patients in GSE7390 were documented, so we divided the patients into ER-positive group and ER-negative group accordingly and investigated the survival differences of patients in the 17-gene signature low risk group and 17-gene signature high risk group in the 2 subgroups. As shown in supplementary Figure 2 both ER-positive and ER-negative patients in the 17-gene signature low risk group showed better overall survival than those in the 17-gene signature high risk group. The above results indicated that the 17-gene signature had significant prognostication capability in patients with BC.
Figure 3

The influence of the 17-gene signature on the overall survival and event-free survival in the training set and test set. (A) Overall survival in the training set. (B) Overall survival in the test set. (C) Event-free survival in training set. (D) Event-free survival in the test set.

Figure 4

The influence of the 17-gene signature on the overall survival in the validation set.

The influence of the 17-gene signature on the overall survival and event-free survival in the training set and test set. (A) Overall survival in the training set. (B) Overall survival in the test set. (C) Event-free survival in training set. (D) Event-free survival in the test set. The influence of the 17-gene signature on the overall survival in the validation set.

Results of GSEA

Finally, in order to investigate the biological foundation that the 17-gene signature affect the survival of BC patients and lay the foundation of future study, we performed GSEA in the training set and test set. As shown in supplementary Table 9 BC sample in the 17-gene high risk group was significantly associated with unfolded protein response, mTORC1 signaling, MYC signaling pathway, E2F signaling pathway, and G2M checkpoint.

Discussions

As mentioned above, BC represents a type of malignant disease with high incidence, limited screening, diagnosis and management approach, and poor prognosis. Therefore, the discovery of novel tumor markers that could predict the survival of patients is of great significance for the treatment and prognosis of patients with BC. In this study, we identified 2519 genes linked to the OS of patients with BC, and the result of functional annotation of the 2519 genes indicated that they participated in immune response and protein synthesis. Thus, we built a 17-gene (NTRK3, C4orf7, ACTL8, CLEC3A, PIGR, CEL, LRP1B, TFPI2, PAX7, NPY1R, ZNF385B, FOXJ1, CDC20B, ALOX15, ELOVL2, IYD, and IGJ) based prognostic signature in BC. Moreover, we assessed the prognostic performance of the 17-gene signature 3 different cohorts (training, test, and validation set) and the results suggested that the 17-gene signature might be a candidate prognostication factor in patients with BC. Compared with the conventional biomarkers in BC, including ER, PR, and HER2, our 17-gene signature has better clinical applicability, and it could be well applied to the clinic only by immunohistochemical method. Meanwhile, among the 17 genes, 12 genes were reported in the carcinogenesis and progression of BC including NTRK3, ACTL8, CLEC3A, LRP1B, TFPI2, PAX7, NPY1R, ZNF385B, FOXJ1, CDC20B, ALOX15, and ELOVL2, indicating that the 17-gene based prognostic signature is reliable. The results of GSEA suggested, the 17-gene signature might affect the survival of BC patients through unfolded protein response, mTORC1 signaling, MYC signaling pathway, E2F signaling pathway, and G2M checkpoint. Actually, the above mechanisms had been found in BC. Shajahan-Haq et al demonstrated that MYC regulated the unfolded protein response in endocrine resistant BC, and Notte et al suggested that unfolded protein response was correlated with the Taxol-induced autophagy and apoptosis in BC cells. Davis et al, Guichard et al, and Lu et al demonstrated that mTORC1 signaling was dysregulated in BC, which provided the possibility of treatment for BC. Rennhack et al showed that E2F signaling mediated metastasis HER2 positive BC patients. Kawamoto et al demonstrated that cyclin B1 and CDC28A were differently expressed between normal breast and BC, which could be applied to differentiate between precancerous human breast lesions and advanced BC. Thus, the results of GSEA gave us a cleaner and deeper understanding of the effect of the 17-gene signature in BC and laid the foundation for future studies. Our study has several limitations. Firstly, the mechanisms that the 17-gene signature affected the survival of BC are derived from statistical inference with no in vivo and in vitro experiment validation. We plan to further explore such mechanisms in vivo and in vitro in future studies. Secondly, the prognostic performance the 17-gene signature has not been confirmed in clinical practice. To the end, our following research will emphasize testing the performance of the 17-gene in clinical trials. In summary, the 17-gene signature we suggested was effective to categorize BC patients into different survival groups, and it might be treated as candidate prognostication factor in clinical settings.

Author contributions

Conceptualization: Bo Long. Data curation: Jin-Xian Qian, Min Yu, Bo Long. Formal analysis: Min Yu, Ai-Mei Jiang. Investigation: Jin-Xian Qian, Min Yu. Methodology: Jin-Xian Qian. Resources: Zhe Sun. Software: Jin-Xian Qian, Min Yu. Validation: Zhe Sun, Ai-Mei Jiang. Visualization: Zhe Sun. Writing – original draft: Jin-Xian Qian. Writing – review & editing: Ai-Mei Jiang, Bo Long.
  41 in total

Review 1.  Neoadjuvant treatment of breast cancer--Clinical and research perspective.

Authors:  Sibylle Loibl; Carsten Denkert; Gunter von Minckwitz
Journal:  Breast       Date:  2015-09-19       Impact factor: 4.380

2.  Salinomycin suppresses LRP6 expression and inhibits both Wnt/β-catenin and mTORC1 signaling in breast and prostate cancer cells.

Authors:  Wenyan Lu; Yonghe Li
Journal:  J Cell Biochem       Date:  2014-10       Impact factor: 4.429

3.  AZD2014, an Inhibitor of mTORC1 and mTORC2, Is Highly Effective in ER+ Breast Cancer When Administered Using Intermittent or Continuous Schedules.

Authors:  Sylvie M Guichard; Jon Curwen; Teeru Bihani; Celina M D'Cruz; James W T Yates; Michael Grondine; Zoe Howard; Barry R Davies; Graham Bigley; Teresa Klinowska; Kurt G Pike; Martin Pass; Christine M Chresta; Urszula M Polanska; Robert McEwen; Oona Delpuech; Stephen Green; Sabina C Cosulich
Journal:  Mol Cancer Ther       Date:  2015-09-10       Impact factor: 6.261

4.  ETV6-NTRK3-mediated breast epithelial cell transformation is blocked by targeting the IGF1R signaling pathway.

Authors:  Cristina E Tognon; Aruna M Somasiri; Valentina E Evdokimova; Genny Trigo; Evett E Uy; Nataliya Melnyk; Joan M Carboni; Marco M Gottardis; Calvin D Roskelley; Michael Pollak; Poul H B Sorensen
Journal:  Cancer Res       Date:  2010-12-09       Impact factor: 12.701

5.  [Lys(DOTA)4]BVD15, a novel and potent neuropeptide Y analog designed for Y1 receptor-targeted breast tumor imaging.

Authors:  Brigitte Guérin; Véronique Dumulon-Perreault; Marie-Claude Tremblay; Samia Ait-Mohand; Patrick Fournier; Céléna Dubuc; Simon Authier; François Bénard
Journal:  Bioorg Med Chem Lett       Date:  2009-12-23       Impact factor: 2.823

6.  MYC regulates the unfolded protein response and glucose and glutamine uptake in endocrine resistant breast cancer.

Authors:  Ayesha N Shajahan-Haq; Katherine L Cook; Jessica L Schwartz-Roberts; Ahreej E Eltayeb; Diane M Demas; Anni M Warri; Caroline O B Facey; Leena A Hilakivi-Clarke; Robert Clarke
Journal:  Mol Cancer       Date:  2014-10-23       Impact factor: 27.401

7.  Neogenin expression is inversely associated with breast cancer grade in ex vivo.

Authors:  Wanying Xing; Qiang Li; Rangjuan Cao; Zheli Xu
Journal:  World J Surg Oncol       Date:  2014-11-22       Impact factor: 2.754

8.  Conserved E2F mediated metastasis in mouse models of breast cancer and HER2 positive patients.

Authors:  Jonathan Rennhack; Eran Andrechek
Journal:  Oncoscience       Date:  2015-11-10

9.  Evaluation of copy-number variants as modifiers of breast and ovarian cancer risk for BRCA1 pathogenic variant carriers.

Authors:  Logan C Walker; Louise Marquart; John F Pearson; George A R Wiggins; Tracy A O'Mara; Michael T Parsons; Daniel Barrowdale; Lesley McGuffog; Joe Dennis; Javier Benitez; Thomas P Slavin; Paolo Radice; Debra Frost; Andrew K Godwin; Alfons Meindl; Rita Katharina Schmutzler; Claudine Isaacs; Beth N Peshkin; Trinidad Caldes; Frans Bl Hogervorst; Conxi Lazaro; Anna Jakubowska; Marco Montagna; Xiaoqing Chen; Kenneth Offit; Peter J Hulick; Irene L Andrulis; Annika Lindblom; Robert L Nussbaum; Katherine L Nathanson; Georgia Chenevix-Trench; Antonis C Antoniou; Fergus J Couch; Amanda B Spurdle
Journal:  Eur J Hum Genet       Date:  2017-02-01       Impact factor: 4.246

10.  Comprehensive molecular portraits of human breast tumours.

Authors: 
Journal:  Nature       Date:  2012-09-23       Impact factor: 49.962

View more
  1 in total

1.  A 16-gene signature associated with homologous recombination deficiency for prognosis prediction in patients with triple-negative breast cancer.

Authors:  Daodu Wang; Yifeng Shi; Hanyang Huang; Qijiong Zhao; Yongyue He; Wenzhi Su
Journal:  Open Med (Wars)       Date:  2022-05-11
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.