| Literature DB >> 25883221 |
Salima Akter1, Tae Gyu Choi1, Minh Nam Nguyen1, Abel Matondo1, Jin-Hwan Kim1, Yong Hwa Jo1, Ara Jo1, Muhammad Shahid1, Dae Young Jun1, Ji Youn Yoo1, Ngoc Ngo Yen Nguyen1, Seong-Wook Seo1, Liaquat Ali2, Ju-Seog Lee3, Kyung-Sik Yoon1, Wonchae Choe1, Insug Kang1, Joohun Ha1, Jayoung Kim4, Sung Soo Kim1.
Abstract
Clinical applications of gene expression signatures in breast cancer prognosis still remain limited due to poor predictive strength of single training datasets and appropriate invariable platforms. We proposed a gene expression signature by reducing baseline differences and analyzing common probes among three recent Affymetrix U133 plus 2 microarray data sets. Using a newly developed supervised method, a 92-probe signature found in this study was associated with overall survival. It was robustly validated in four independent data sets and then repeated on three subgroups by incorporating 17 breast cancer microarray datasets. The signature was an independent predictor of patients' survival in univariate analysis [(HR) 1.927, 95% CI (1.237-3.002); p < 0.01] as well as multivariate analysis after adjustment of clinical variables [(HR) 7.125, 95% CI (2.462-20.618); p < 0.001]. Consistent predictive performance was found in different multivariate models in increased patient population (p = 0.002). The survival signature predicted a late metastatic feature through 5-year disease free survival (p = 0.006). We identified subtypes within the lymph node positive (p < 0.001) and ER positive (p = 0.01) patients that best reflected the invasive breast cancer biology. In conclusion using the Common Probe Approach, we present a novel prognostic signature as a predictor in breast cancer late recurrences.Entities:
Keywords: breast cancer; gene signature; microarray; prognosis
Mesh:
Substances:
Year: 2015 PMID: 25883221 PMCID: PMC4558178 DOI: 10.18632/oncotarget.3525
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Breast cancer microarray datasets used in this study
| GEO Number | Origin/Year | Author | Paper Title | Chip type |
|---|---|---|---|---|
| Data set 1GSE42568 | Ireland, 2013 | Breast Cancer Gene Expression Analysis | HG-U133_Plus_2 | |
| Data set 2GSE20685 | Taiwan, 2011 | Microarray-based molecular subtyping of breast cancer | HG-U133_Plus_2 | |
| Data set 3GSE31448 | France, 2011 | Down-regulation of ECRG4, a candidate tumor suppressor gene in human breast cancer | HG-U133_Plus_2 | |
| Data set 4GSE12276 | Netherlands, 2009 | Expression data from primary breast tumors | HG-U133_Plus_2 | |
| Data set 5GSE48390 | Taiwan, 2013 | Concurrent Gene Signatures for Han Chinese Breast Cancers | HG-U133_Plus_2 |
GSE, GEO datasets number prefixes; HG-U133_ Plus_ 2, a type of oligonucleotide gene chip from the Affymetrix.
Clinical and demographical characteristics of the patients
| Variable | Data set 1 | Data set 2 | Data set 3 | Data set 4 | Data set 5 |
|---|---|---|---|---|---|
| 104 | 327 | 246 | 195 | 81 | |
| 56 (31–90) | 46 (24–84) | 54.5 (24–84) | |||
| 63 (4.6–111) | 97 (5–169) | 54.2 (3.4–222.3) | 27 (3–115) | 50 (0.9–69.0) | |
| | 11 (10.5%) | 43 (17.5%) | |||
| | 40 (38.5%) | 84 (34.1%) | |||
| | 53 (51.0%) | 119 (47.2%) | |||
| | 3 (1.2%) | ||||
| | 67 (64.4%) | 139 (56.5%) | 53 (65.4%) | ||
| | 34 (32.7%) | 105 (42.7%) | 28 (34.6%) | ||
| | 3 (2.9%) | 2 (0.8%) | |||
| | 120 (48.8%) | ||||
| | 124 (50.4%) | ||||
| | 2 (0.8%) | ||||
| | 59 (56.7%) | 129 (52.4%) | |||
| | 45 (43.3%) | 115 (46.8%) | |||
| | 2 (0.8%) | ||||
| | 96 (92.3%) | ||||
| | 8 (7.7%) |
Data set 1, Ireland cohorts (GSE42568); Data set 2 and 5, Taiwan cohorts (GSE20568 and GSE48390 respectively); Data set 3, France cohorts (GSE31448); Data set 4, The Netherlands cohorts (GSE12276); N/A, not available.
Figure 1A workflow in this study
The significant GO biological pathways pointed to by the 92-probe signature
| ID | Name | No of genes | Gene symbol | |
|---|---|---|---|---|
| GO:0009725 | Response to hormone stimulus | 9 | 0.000124 | ADCY1, IGF1R, GATA3, RERG, TGFBR3, SERPINA1, ERBB4, NPY1R, ESR1 |
| GO:0048732 | Gland development | 6 | 0.000233 | IGF1, IGF1R, PGR, FOXC1, ERBB4, FOXA1 |
| GO:0010033 | Response to organic substance | 11 | 0.000665 | HSPA2, ADCY1, GATA3, IGF1R, ABAT, RERG, SERPINA1, TGFBR3, ERBB4, NPY1R, ESR1 |
| GO:0021700 | Developmental maturation | 5 | 0.000798 | PGR, ERBB4, NTN4, RET, FOXA1 |
| GO:0001655 | Urogenital system development | 5 | 0.001098 | FOXC1, AGTR1, SOX11, RET, FOXA1 |
| GO:0030879 | Mammary gland development | 4 | 0.002238 | IGF1, IGF1R, PGR, ERBB4 |
| GO:0007610 | Behavior | 8 | 0.003072 | ADCY1, ABAT, PPP1R1B, ZIC1, CXCL14, NOVA1, NPY1R, S100A9 |
| GO:0050678 | Regulation of epithelial cell proliferation | 4 | 0.003147 | IGF1, PGR, TGFBR3, ERBB4 |
| GO:0048469 | Cell maturation | 4 | 0.003675 | PGR, NTN4, RET, FOXA1 |
| GO:0030334 | Regulation of cell migration | 5 | 0.00521 | IGF1, IGF1R, TGFBR3, ERBB4, PARD6B |
| GO:0001822 | Kidney development | 4 | 0.007315 | FOXC1, AGTR1, SOX11, RET |
| GO:0048545 | Response to steroid hormone stimulus | 5 | 0.008129 | GATA3, SERPINA1, ERBB4, NPY1R, ESR1 |
| GO:0051270 | Regulation of cell motion | 5 | 0.008276 | IGF1, IGF1R, TGFBR3, ERBB4, PARD6B |
| GO:0043627 | Response to estrogen stimulus | 4 | 0.00935 | GATA3, SERPINA1, NPY1R, ESR1 |
| GO:0040008 | Regulation of growth | 6 | 0.013202 | IGF1, RERG, FOXC1, AGTR1, NPY1R, MAPT |
| GO:0007167 | Enzyme linked receptor protein signaling pathway | 6 | 0.013356 | IGF1R, REPS2, FOXC1, TGFBR3, ERBB4, RET |
| GO:0007169 | Transmembrane receptor protein tyrosine kinase signaling pathway | 5 | 0.013730 | IGF1R, REPS2, FOXC1, ERBB4, RET |
| GO:0019932 | Second-messenger-mediated signaling | 5 | 0.016108 | ADCY1, IGF1, IGF1R, AGTR1, NPY1R |
| GO:0014031 | Mesenchymal cell development | 3 | 0.018849 | FOXC1, TGFBR3, RET |
| GO:0048762 | Mesenchymal cell differentiation | 3 | 0.018849 | FOXC1, TGFBR3, RET |
| GO:0060485 | Mesenchyme development | 3 | 0.019552 | FOXC1, TGFBR3, RET |
| GO:0002070 | Epithelial cell maturation | 2 | 0.020530 | PGR, FOXA1 |
| GO:0003006 | Reproductive developmental process | 5 | 0.023000 | HSPA2, IGF1R, PGR, FOXC1, FOXA1 |
| GO:0007626 | Locomotory behavior | 5 | 0.026563 | ABAT, CXCL14, NOVA1, NPY1R, S100A9 |
| GO:0014855 | Muscle cell proliferation | 2 | 0.028626 | FOXC1, TGFBR3 |
| GO:0008283 | Cell proliferation | 6 | 0.033837 | IGF1, PDZK1, FOXC1, TGFBR3, ERBB4, SOX11 |
| GO:0030182 | Neuron differentiation | 6 | 0.034411 | IGF1R, RTN1, NTN4, RET, PARD6B, FOXA1 |
| GO:0033002 | Muscle cell proliferation | 2 | 0.036656 | FOXC1, TGFBR3 |
| GO:0008015 | Blood circulation | 4 | 0.041680 | ABAT, FOXC1, AGTR1, NPY1R |
| GO:0042127 | Regulation of cell proliferation | 8 | 0.042679 | IGF1, IGF1R, RERG, PGR, AGTR1, TGFBR3, ERBB4, RARRES1 |
| GO:0006928 | Cell motion | 6 | 0.046139 | IGF1, FOXC1, TGFBR3, RET, DNALI1, S100A9 |
p-value represents the significance of enrichment and is estimated by Bonferroni test.
Figure 2Association of the 92-probe signature in respect to clinical and survival information of 104 primary breast tumor patients in training dataset 1
A. Prognostic index in dataset 1. Each bar represents the prognostic index for an individual patient. B. The association of survival and clinical information within the two risk groups in dataset 1. C. The heatmap of the median-centered 92-probe expression profile (green, relative high expression; sky blue, relative low expression). D. and E. Kaplan-Meier plots of the two subgroups in the training cohort predicted by CCP. p values were obtained from log-rank test. The ‘+’ symbols in the panels indicate censored data. CCP, compound covariate predictor; OS, overall survival; RFS, relapse free survival.
Figure 3Association of the 102 and the 92-probe sets with survival information of primary breast tumor patients in dataset 1, 2 and 3 respectively
A–F. Kaplan-Meier plots of the two subgroups were predicted by CCP. (A and B) Dataset 1. (C and D) dataset 2. (E and F) dataset 3. p values were obtained from log-rank test. The ‘+’ symbols in the panels indicate censored data. CCP, compound covariate predictor; OS, overall survival; DFS, disease free survival.
Figure 4Construction of prediction model in test cohorts based on gene expression signature from data set 1
A. Schematic overview of the strategy used for the construction of prediction models and evaluation of predicted outcomes depending on the 92-probe signature. B–E. Kaplan-Meier plots of survival graph. According to survival time, patients were stratified into two risk-subgroups, predicted by CCP. (A) Overview of the prognostic signature validation strategy. (B) Dataset 2. (C) Dataset 3. (D) Dataset 4. (E) Dataset 5. p values were obtained from log-rank test. The ‘+’ symbols in the panels indicate censored data. CCP, compound covariate predictor; OS, overall survival; DFS, disease free survival; DMFS, disease metastasis free survival.
Figure 5Outcome predictions in the combined validation cohorts
Kaplan-Meier survival curves were constructed using 92-probe expression from the training dataset. A. Combination of data sets 2 and 3. B. Five other plus 2 chip combination. C. Ten affymetrix U133A chip combination. Patients were stratified according to median prognostic index into two risk subgroups predicted by CCP. p values were obtained from log-rank test. The ‘+’ symbols in the panels indicate censored data.
The univariate and the multivariate cox proportional hazard regression analyses for patients' survival in France cohort
| Parameters ( | Univariate | Multivariate | ||
|---|---|---|---|---|
| HR (95%CI) | HR (95%CI) | |||
| Age (years) | 0.997 (0.980–1.015) | 0.756 | 0.990 (0.969–1.010) | 0.328 |
| ER status (+/–) | 0.687 (0.440–1.072) | 0.098 | 0.977 (0.337–2.833) | 0.965 |
| PR status (+/–) | 0.816 (0.524–1.270) | 0.368 | 1.349 (0.527–3.455) | 0.533 |
| Lymph node (+/–) | 1.493 (0.952–2.341) | 0.081 | 1.357 (0.794–2.317) | 0.264 |
| Grade (1, 2, 3) | 1.592 (1.159–2.188) | 0.004 | 0.912 (0.605–1.376) | 0.661 |
| P53 status (yes/no) | 1.814 (1.100–2.991) | 0.020 | 1.378 (0.792–2.399) | 0.257 |
| Mol_Sub (I, II, III, IV, V, VI) | 1.037 (0.875–1.229) | 0.677 | 1.799 (1.272–2.544) | 0.001 |
| 92-probe signature (high/low) | 1.927 (1.237–3.002) | 0.004 | 7.125 (2.462–20.618) | <0.001 |
HR, hazard ratio; CI, confident interval; ER, estrogen receptor; PR, progesterone receptor; Mol_Sub, molecular subtype; A low risk was defined as a prognostic index less than or equal to −0.272144, and a high risk as a PI higher than −0.272144.
Multivariate analysis of age, ER-, PR-status, lymph node, grade and 92-probe signature in relation to the patient's survival
| Parameters | HR (95%CI) | |
|---|---|---|
| | 0.996 (0.980–1.013) | 0.674 |
| | 0.570 (0.269–1.204) | 0.141 |
| | 1.148 (0.555–2.371) | 0.710 |
| | 1.799 (1.161–2.788) | 0.009 |
| | 1.536 (1.121–2.105) | 0.008 |
| | 0.996 (0.979–1.014) | 0.674 |
| | 1.111 (0.479–2.577) | 0.806 |
| | 1.269 (0.610–2.639) | 0.524 |
| | 1.856 (1.197–2.876) | 0.006 |
| | 1.382 (1.000–1.908) | 0.050 |
| | 2.746 (1.443–5.227) | 0.002 |
The multivariate model included 301 patients for DFS, owing to missing values in twenty two. A low risk was defined as a prognostic index less than or equal to −0.272144, and a high risk as a PI higher than −0.272144. HR, hazard ratio; CI, confident interval; ER, estrogen receptor; PR, progesterone receptor.
Multivariate analysis of age, ER-, PR-status, lymph node, grade and 92-probe signature in relation to the 5-year survival
| Parameters | HR (95%CI) | |
|---|---|---|
| | 0.984 (0.971–0.998) | 0.025 |
| | 0.510 (0.270–0.961) | 0.037 |
| | 0.905 (0.479–1.710) | 0.757 |
| | 1.517 (1.062–2.165) | 0.022 |
| | 1.708 (1.291–2.258) | <0.001 |
| | 0.984 (0.970–0.998) | 0.023 |
| | 0.878 (0.423–1.825) | 0.728 |
| | 0.974 (0.512–1.853) | 0.936 |
| | 1.539 (1.079–2.197) | 0.017 |
| | 1.562 (1.173–2.080) | 0.002 |
| | 2.239 (1.265–3.963) | 0.006 |
A low risk was defined as a prognostic index less than or equal to −0.272144, and a high risk as a PI higher than −0.272144. HR, hazard ratio; CI, confident interval; ER, estrogen receptor; PR, progesterone receptor.
Figure 6Significant association of the 92-probe signature with ER status in different datasets
A–F. Kaplan-Meier curves of patients in ER-negative and ER-positive groups. Patients were classified according to the prognostic index of the 92-probe signature. (A and B) Dataset 1. (C and D) Dataset 3. (E and F) Dataset 5. p values were obtained from log-rank test. The ‘+’ symbols in the panels indicate censored data. DFS, disease free survival; RFS, relapse free survival.
Figure 7Significant association of 92-probe signature with lymph node status in different datasets
A–H. Kaplan-Meier curves of patients in lymph node-negative and lymph node-positive groups. Patients were classified according to the prognostic index of the 92-probe signature. (A and B) Dataset 1. (C and D) Dataset 3 including all tumor grades. (E and F) Dataset 3 including tumor grade 1 and 2 or pT1 and pT2. (G and H) Canada cohorts including datasets (6, 7 and 8). p values were obtained from log-rank test. The ‘+’ symbols in the panels indicate censored data. DFS, disease free survival; RFS, relapse free survival.
Figure 8Network analysis of the 92-probe signature in the primary breast cancer
Node and edge size were generated according to the number of connections within the module.