| Literature DB >> 16643655 |
Zhiyuan Hu1, Cheng Fan, Daniel S Oh, J S Marron, Xiaping He, Bahjat F Qaqish, Chad Livasy, Lisa A Carey, Evangeline Reynolds, Lynn Dressler, Andrew Nobel, Joel Parker, Matthew G Ewend, Lynda R Sawyer, Junyuan Wu, Yudong Liu, Rita Nanda, Maria Tretiakova, Alejandra Ruiz Orrico, Donna Dreher, Juan P Palazzo, Laurent Perreard, Edward Nelson, Mary Mone, Heidi Hansen, Michael Mullins, John F Quackenbush, Matthew J Ellis, Olufunmilayo I Olopade, Philip S Bernard, Charles M Perou.
Abstract
BACKGROUND: Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list.Entities:
Mesh:
Year: 2006 PMID: 16643655 PMCID: PMC1468408 DOI: 10.1186/1471-2164-7-96
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of the analysis methods and datasets used in this paper.
Figure 2Hierarchical cluster analysis of the 315-sample combined test set using the Intrinsic/UNC gene set reduced to 306 genes. (A) Overview of complete cluster diagram. (B) Experimental sample-associated dendrogram. (C) Luminal/ER+ gene cluster with GATA3-regulated genes highlighted in pink. (D) HER2 and GRB7-containing expression cluster. (E) Interferon-regulated cluster containing STAT1. (F) Basal epithelial cluster. (G) Proliferation cluster.
Figure 3Kaplan-Meier survival curves of breast tumors classified by intrinsic subtype. Survival curves are shown for (A) the 315-sample combined test set classified by hierarchical clustering using the Intrinsic/UNC gene set and (B) the 60-sample Ma et al., (C) 96-sample Chang et al., and (D) 105-sample (used to derive the Intrinsic/UNC gene set) datasets classified by the Nearest-Centroid predictor (Single Sample Predictor).
Figure 4Kaplan-Meier survival curves using RFS as the endpoint, for the common clinical parameters present within the 315-sample combined test set. Survival curves are shown for (A) ER status, (B) node status, (C) histologic grade (1 = well-differentiated, 2 = intermediate, 3 = poor), and (D) tumor size (1 = diameter of 2 cm or less; 2 = diameter greater than 2 cm and less than or equal to 5 cm; 3 = diameter greater than 5 cm; 4 = any size with direct extension to chest wall or skin).
Multivariate Cox proportional hazards analysis of (A) standard clinical factors alone, or with (B) the Intrinsic Subtypes in relation to Relapse-Free Survival for the 315-sample combined test set. Size was a binary variable (0 = diameter of 2 cm or less, 1 = greater than 2 cm); node status was a binary variable (0 = no positive nodes, 1 = one or more positive nodes); age was a continuous variable formatted as decade-years. Hazard ratios for Intrinsic Subtypes were calculated relative to the Luminal A subtype. Variables found to be significant (p < 0.05) in the Cox proportional hazards model are shown in bold.
| Age, per decade | 1.04 (0.90–1.20) | 0.64 |
| ER status | ||
| Node status | 1.41 (0.98–2.04) | 0.07 |
| Tumor grade 2 | ||
| Tumor grade 3 | ||
| Size | ||
| Age, per decade | 1.08 (0.94–1.24) | 0.29 |
| ER status | 0.69 (0.42–1.13) | 0.14 |
| Node status | 1.35 (0.92–1.98) | 0.13 |
| Tumor grade 2 | 1.88 (0.82–4.32) | 0.14 |
| Tumor grade 3 | ||
| Size | ||
| Basal-like vs. LumA | ||
| HER2+/ER- vs. LumA | ||
| LumB vs. LumA | ||
| IFN vs. LumA | 1.40 (0.67–2.91) | 0.37 |
| Normal-like vs. LumA | 1.56 (0.59–4.16) | 0.37 |
Association between tumor histologic grade and intrinsic subtype in the 315-sample combined test set.
| 1 (well) | 29 | 2 | 1 | 0 | 1 |
| 2 (intermediate) | 45 | 26 | 8 | 6 | 16 |
| 3 (poor) | 15 | 32 | 16 | 21 | 67 |
| p-value† <0.0001 | |||||
| Cramer's V†† 0.42 | |||||
†p-value calculated from Chi-square test on contingency table. ††Cramer's V statistic (value can range from 0 to 1) measures the strength of association between the two variables analyzed in the contingency table, with 1 indicating perfect association and 0 indicating no association.