| Literature DB >> 25723490 |
Richard E Neapolitan1, Xia Jiang2.
Abstract
BACKGROUND: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures. METHODOLOGY/Entities:
Mesh:
Year: 2015 PMID: 25723490 PMCID: PMC4344205 DOI: 10.1371/journal.pone.0117658
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The clinical variables used to predict survival.
| Variable | Description | Values |
|---|---|---|
| age_at_diagnosis | age at diagnosis of the disease | 0–39, 39–54, 54–69, 69–84, 84–100 |
| size | size of tumor in cm | 0–20, 20–50, 50–180 |
| lymph_nodes_positive | number of positive lymph nodes | 0, 1, 2–3, 4–5, 6–9. ≥10 |
| grade | grade of disease | 1, 2, 3 |
| histological | tumor histology | IDC, IDC+ILC, IDC-TUB, IDC-MUC, IDC-MED, MIXED NST AND A SPECIAL TYPE, OTHER, OTHER INVASIVE, INVASIVE TUMOR |
| ER_IHC_status | ER status | pos, neg |
| ER_Expr | estrogen receptor expression | +, - |
| PR_Expr | progesterone receptor expression | +, - |
| HER2_SNP6_state | HER2 copy number gain or loss | NEUT, GAIN, LOSS |
| HER2_Expr | HER2 expression | +, - |
| treatment | Treatment | None, HT, RT, CT, HT/RT, HT/CT, RT/CT, HT/RT/CT |
| inf_men_status | inferred menopausal status | pre, post |
| group | characterizes patients by lymph node status, chemo- and hormonal therapy | 1, 2, 3, 4, other |
| stage | composite of size and number of lymph nodes positive | numeric |
| lymph_nodes_removed | number of lymph nodes removed | numeric |
| NPI | the Nottingham Prognostic Index, a composite of tumor size, number of lymph nodes positive, and grade | numeric |
| cellularity | cells seen on histopathology | high, low, moderate |
| Pam50_subtype | subtype inferred from expression data for 50 genes | Basal, Her2, LumA, LumB, NC, Normal |
| int_clust_memb | cluster membership according to METABRIC | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 |
| site | collection site information specific to METABRIC | 1, 2, 3, 4, 5 |
| Genefu | A composite of other variables used by METABRIC | ER+/HER2-, High Prolif, Low Prolif, ER-/HER2-, HER2+ |
A table developed from the METABRIC data set.
| Patient |
|
| … | Xn |
|
|
| … |
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | alive | alive | alive | alive | alive | |||||
| 2 | alive | dead | dead | dead | dead | |||||
| 3 | alive | alive | NA | NA | NA | |||||
| … |
Comparison of the Cox concordance index results and RSF concordance index results for each type of data.
| Year | Clinical_Only | Clinical_PAM | Clinical_Gene | |||
|---|---|---|---|---|---|---|
| Cox | RSF | Cox | RSF | Cox | RSF | |
| 5 | 0.709 | 0.714 | 0.713 | 0.721 | 0.703 (30) | 0.720 (150) |
| 10 | 0.718 | 0.723 | 0.720 | 0.724 | 0.719 (30) | 0.731 (50) |
| 15 | 0.694 | .0692 | 0.698 | 0.705 | 0.696 (30) | 0.706 (100) |
For Clinical_Gene these are the best results obtained by the model. They were all obtained with Method 2. The number in parenthesis shows the number of features obtained from ReliefF that yielded the best results. Other than the Year 15 entry for Clinical_Only, the comparison of the Cox results and RSF results is significant at p < 2.2 × 10–16.
Fig 1Comparison of the Cox concordance index results and the RSF concordance index results for each type of data.
Comparison of the best concordance index results obtained for each data set over both models (Cox and RSF) and all values of the number of features provided to ReliefF.
| Year | Clinical_Only | Clinical_PAM | Clinical_Gene | p-value PAM | p-value Gene |
|---|---|---|---|---|---|
| 5 | 0.714 | 0.721 | 0.720 | 1.17×10–14 | 6.77×10–14 |
| 10 | 0.723 | 0.724 | 0.731 | 0.251 | < 2.2 × 10–16 |
| 15 | 0.694 | 0.705 | 0.706 | p < 2.2 × 10–16 | < 2.2 × 10–16 |
The 5th column shows the p-values obtained when Clinical_Only is compared to Clinical_PAM. The 6th column shows the p-value when Clinical_Only is compared to Clinical_Gene.
Fig 2Comparison of the best results obtained for each data set over both models (Cox and RSF) and all values of the number of features provided to ReliefF.
Comparison of the best concordance index results obtained using Clinical_Gene and the RSF model to the concordance index results obtained using Clinical_Only and the Cox model.
| Year | Clinical_Only/Cox | Clinical_Gene/RSF | Percent Increase |
|---|---|---|---|
| 5 | 0.709 | 0.720 | 0.015 |
| 10 | 0.718 | 0.731 | 0.018 |
| 15 | 0.694 | 0.706 | 0.017 |
All results are significant at p < 2.2 × 10–16.
Fig 3Comparison of the best concordance index results obtained using Clinical_Gene and the RSF model to the concordance index results obtained using Clinical_Only and the Cox model.
Fig 4Heat map clustering 1981 breast cancer tumors and the top 150 genes learned using ReliefF from the entire data set for 5 year survival prediction.
Fig 5Heat map clustering 1981 breast cancer tumors and the top 50 genes learned using ReliefF from the entire data set for 10 year survival prediction.
Fig 6Heat map clustering 1981 breast cancer tumors and the top 100 genes learned using ReliefF from the entire data set for 15 year survival prediction.
The predictive genes, from the top 150 genes extracted by ReliefF, that the time frame pairs have in common.
| 5 / 10 year | 5 / 15 year | 10 / 15 year |
|---|---|---|
| OR2AG1 | MND1 | MND1 |
| EGFL7 | CKAP2L | CDKN3 |
| SLC30A8 | PTTG3P | |
| RRM2 | PRC1 | |
| MND1 | RACGAP1 | |
| HSPB1 | NCOA3 | |
| C2 | HMGCS1 | |
| FANCE | CKAP2L | |
| OSBPL1A | CDCA5 | |
| CEBPG | CEP55 | |
| F2RL3 | KIF4A | |
| CCNA2 | SFRP1 | |
| CKAP2L | ||
| PTK2B | ||
| C7orf41 | ||
| CD84 | ||
| CD4 | ||
| DSCC1 |
The clinical features extracted by ReliefF in the case of Method 1.
| 5 Year | 10 Year | 15 Year | |||
|---|---|---|---|---|---|
| Rank | Feature | Rank | Feature | Rank | Feature |
| 1 | lymph_nodes_positive | 1 | group | 5 | NPI |
| 2 | group | 2 | lymph_nodes_positive | 77 | group |
| 49 | stage | 3 | NPI | 129 | treatment |
| 110 | size | 12 | size | 138 | int_clust_memb |
| 14 | site | ||||
| 102 | stage | ||||
| 145 | age_at_diagnosis | ||||
The Rank is where the feature occurs in the top 150 features.