| Literature DB >> 28544536 |
Daichi Shigemizu1,2,3,4, Takuji Iwase5, Masataka Yoshimoto6, Yasuyo Suzuki7, Fuyuki Miya1,2, Keith A Boroevich2, Toyomasa Katagiri8,9, Hitoshi Zembutsu10, Tatsuhiko Tsunoda1,2,3.
Abstract
The goal of this study is to establish a method for predicting overall survival (OS) and disease-free survival (DFS) in breast cancer patients after surgical operation. The gene expression profiles of cancer tissues from the patients, who underwent complete surgical resection of breast cancer and were subsequently monitored for postoperative survival, were analyzed using cDNA microarrays. We detected seven and three probes/genes associated with the postoperative OS and DFS, respectively, from our discovery cohort data. By incorporating these genes associated with the postoperative survival into MammaPrint genes, often used to predict prognosis of patients with early-stage breast cancer, we constructed postoperative OS and DFS prediction models from the discovery cohort data using a Cox proportional hazard model. The predictive ability of the models was evaluated in another independent cohort using Kaplan-Meier (KM) curves and the area under the receiver operating characteristic curve (AUC). The KM curves showed a statistically significant difference between the predicted high- and low-risk groups in both OS (log-rank trend test P = 0.0033) and DFS (log-rank trend test P = 0.00030). The models also achieved high AUC scores of 0.71 in OS and of 0.60 in DFS. Furthermore, our models had improved KM curves when compared to the models using MammaPrint genes (OS: P = 0.0058, DFS: P = 0.00054). Similar results were observed when our model was tested in publicly available datasets. These observations indicate that there is still room for improvement in the current methods of predicting postoperative OS and DFS in breast cancer.Entities:
Keywords: Breast cancer; MammaPrint genes; disease-free survival; overall survival; prediction model
Mesh:
Year: 2017 PMID: 28544536 PMCID: PMC5504310 DOI: 10.1002/cam4.1092
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.452
OS and DFS rates for all patients in the first and second discovery cohorts
| # patients | OS | DFS | ||||
|---|---|---|---|---|---|---|
| Dead | Survive | # Probes | Relapse | Nonrelapse | # Probes | |
| First cohort | 11 | 70 | 9980 | 19 | 62 | 9973 |
| Second cohort | 8 | 8 | 15,589 | 5 | 11 | 15,102 |
| Combined cohort | 19 | 78 | 10,337 | 24 | 73 | 10,611 |
Genes affecting postoperative OS and DFS using Cox proportional hazard models
| Gene symbol (Gene ID) | (1) | (2) Coefficient | (3) | |||
|---|---|---|---|---|---|---|
| First discovery cohort | Second discovery cohort |
| HR | 95% CI | ||
| OS | ||||||
|
| 1.27 × 10−06 | −1.31 | −0.075 | 0.041 | 0.67 | 0.52–0.87 |
|
| 2.95 × 10−05 | 1.79 | 0.04 | 0.034 | 2.20 | 1.34–3.60 |
|
| 0.00012 | 1.30 | 0.38 | 0.042 | 2.02 | 1.27–3.21 |
|
| 0.00018 | 1.01 | 0.71 | 0.0012 | 2.43 | 1.62–3.64 |
|
| 0.00020 | −1.45 | −0.15 | 0.00046 | 0.29 | 0.18–0.48 |
|
| 0.00022 | 1.32 | 0.33 | 0.045 | 1.96 | 1.24–3.10 |
|
| 0.00024 | 1.1280 | 0.04675 | 0.071 | 1.74 | 1.16–2.62 |
| DFS | ||||||
|
| 5.52 × 10−05 | 0.90 | 0.067 | 0.14 | 2.31 | 1.55–3.44 |
|
| 6.33 × 10−05 | −1.04 | −0.32 | 0.17 | 0.43 | 0.28–0.67 |
|
| 0.00025 | −0.60 | −0.42 | 0.18 | 0.59 | 0.44–0.79 |
We performed gene selection through the following steps: (1) The top significantly differentially expressed genes under different conditions (OS, survival and dead; DFS, relapse and nonrelapse) using a Cox proportional hazard model from the first discovery cohort data (P < 0.0003). (2) Genes showing the effects in the same direction as genes detected in step 1 from the second discovery cohort. (3) Genes showing q < 0.2 in the combined discovery cohort. HR, hazard ratio.
Figure 1The Kaplan–Meier curves and the receiver operator characteristic curves for the prediction models using the MammaPrint gene set. Based on the MammaPrint gene set, prediction models were constructed from our combined discovery cohort data using a Cox proportional hazard model. A prognostic index was assigned to each subject was calculated by applying the MammaPrint gene set to each of the prediction models. Based on this prognostic index, the optimal cutoff values indicated by a minimum log‐rank trend test P‐value were determined by comparing the difference between high‐ (red) and low‐risk (black) groups in the combined discovery cohort for cumulative overall survival (OS) (A) and disease‐free survival (DFS) (B). OS: optimal cutoff = 6.59, minimum P < 1.11 × 10−16, DFS: optimal cutoff = 8.16, minimum P = 7.99 × 10−15. The AUC achieved in OS was 0.84 (95% CI = 0.73 to 0.95) (C) and the AUC for DFS was 0.68 (95% CI = 0.56–0.81) (D). The receiver operator characteristic curve achieved a maximum sensitivity of 0.74 and specificity of 0.83 in OS (C), and a maximum sensitivity of 0.54 and specificity of 0.81 in DFS (D).
Figure 2The Kaplan–Meier curves and the receiver operator characteristic curves for the prediction models using our improved gene sets. On the basis of our improved gene sets, prediction models were constructed from our combined discovery cohort data using a Cox proportional hazard model. Prognostic index assigned to each subject was calculated by applying the improvement related genes to each of the prediction models. Based on this prognostic index, optimal cutoff values indicated by a minimum log‐rank trend test P‐value were determined by comparing the difference between high‐ (red) and low‐risk (black) groups in the combined discovery cohort for cumulative overall survival (OS) (A) and disease‐free survival (DFS) (B). OS: optimal cutoff = 7.14, minimum P < 1.11 × 10−16, DFS: optimal cutoff = 9.60, minimum P = 7.99 × 10−15. The AUC achieved in OS was 0.92 (95% CI = 0.86–0.99) (C) and the AUC for DFS was 0.72 (95% CI = 0.60–0.84) (D). The receiver operator characteristic curve achieved a maximum sensitivity of 0.79 and specificity of 0.91 in OS (C), and a maximum sensitivity of 0.79 and specificity of 0.62 in DFS (D).
Figure 3Verification our prediction models using the GSE42568 cohort (validation cohort) obtained from the GEO database in OS. Kaplan–Meier curves for the OS prediction models using the MammaPrint gene set (A) and using our OS improved gene set (B) when using GSE42568's data for the risk prediction model verification. Receiver operator characteristic curves for the OS prediction models using the MammaPrint gene set (C) and using our OS improved gene set (D) when using GSE42568's data for the risk prediction model verification.
Figure 4Verification our prediction models using the GSE42568 cohort (validation set) obtained from the GEO database in DFS. Kaplan–Meier curves for the DFS prediction models using the MammaPrint gene set (A) and using our DFS improved gene set (B) when using GSE42568's data for the risk prediction model verification. Receiver operator characteristic curves for the DFS prediction models using the MammaPrint gene set (C) and using our DFS improved gene set (D) when using GSE42568's data for the risk prediction model verification.