| Literature DB >> 17076897 |
Andrew E Teschendorff1, Ali Naderi, Nuno L Barbosa-Morais, Sarah E Pinder, Ian O Ellis, Sam Aparicio, James D Brenton, Carlos Caldas.
Abstract
BACKGROUND: A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17076897 PMCID: PMC1794561 DOI: 10.1186/gb-2006-7-10-r101
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Breast cancer data sets used
| Study | Cohort name | Platform | ER+ samples | Events (RIP/DM) |
| van de Vijver [18] | NKI2 | oligos Agilent | 226 | 45 |
| Wang [11] | EMC | oligos Affymetrix | 208 | 80 |
| Naderi [12] | NCH | oligos Agilent | 93 | 21 |
| Sotiriou [25] | JRH-1 | spotted cDNA | 65 | 20 |
| Miller [26] | UPP | oligos Affymetrix | 213 | 49 |
| Sotiriou [4] | JRH-2 | oligos Affymetrix | 72 | 17 |
Study, cohort name, microarray platform, number of ER+ patients and death (or surrogate distant metastasis) events among ER+ cases. The cohorts are described in [4,11,12,18,25,26].
Figure 1(a) For each of 10 random partitions of training cohorts into training and test sets we rank the genes according to their average Cox-scores over the Ntraining cohorts (N= 3). (b) 1, Definition of MPI and evaluation of the optimal classifier(s) using the independent test sets of the training cohorts. 2, denotes the D-index of the top n-gene classifier for partition/realization p in test set of the training cohort s. 3, denotes the weighted average D-index over the test sets in the training cohorts where Ndenotes the size of the test set of training cohort s. 4, The optimal classifier for each partition/realization p, , is defined by the number of top-ranked genes, n, that maximizes . (c) Validation of the optimal classifiers in completely independent external cohorts.
The D-index of prognostic factors across cohorts
| Training | Test | |||||
| Factor | NKI2 | ECM | NCH | JRH-1 | UPP | JRH-2* |
| Grade | 3.80 (<10-5) | NA | 3.57 (0.001) | 3.84 (0.003) | 2.55 (0.0003) | 2.15 (0.17) |
| Node status | 1.01 (0.97) | all LN- | 2.23 (0.05) | 2.64 (0.04) | 4.03 (<10-6) | 2.36 (0.25) |
| Size | 1.59 (0.06) | NA | 3.36 (0.003) | 4.16 (<10-3) | 3.18 (<10-5) | 3.04 (0.008) |
| NPI | 2.27 (<10-3) | NA | 4.07 (<10-3) | 5.16 (<10-4) | 3.82 (<10-7) | 3.78 (0.03) |
| MPI† | 3.32 (<10-3) | 2.29 (0.002) | NA | 3.20 (0.002) | 2.71 (<10-4) | 7.96 (<10-4) |
| MPI‡ | 3.64 (<10-7) | 2.56 (<10-6) | 6.45 (<10-5) | 3.44 (<10-3) | 2.80 (<10-4) | 11.26 (<10-5) |
| MPI§ | 3.64 (<10-6) | 2.51 (<10-5) | 6.51 (<10-5) | 3.10 (<10-5) | 2.84 (0.001) | 10.10 (<10-4) |
For the classical prognostic factors we give, where available, the D-index and log-rank test p values in the training cohorts NKI2, ECM and NCH, and test cohorts JRH-1, UPP and JRH-2. *For JRH-2 the number of samples with available grade and node status information were only 57 and 38, respectively. †For the MPI we give the median D-index and log-rank test p value over the ten molecular classifiers. The range for the D-index and p values over the 10 classifiers were: 2.27 to 4.35 (0.009 to 1.1 × 10-5) in NKI2; 1.78 to 2.75 (0.024 to 2 × 10-4) in ECM; 2.04 to 3.96 (0.039 to 0.0003) in JRH-1; 2.39 to 3.04 (1.7 × 10-4 to 6.7 × 10-6) in UPP; and 5.08 to 12.61 (8 × 10-4 to 8.4 × 10-6) in JRH-2. ‡The MPI based on the optimal 52-gene classifier. §The MPI based on the 17-gene classifier. NA, not available.
Figure 2The MPI in the training and test cohorts. Heatmaps of relative gene expression (green = 'low', red = 'high') of the optimal 52-gene classifier and accompanying MPI distribution values across the three training cohorts (a) NKI2, (b) EMC and (c) NCH, and three test cohorts (d) JRH-1, (e) UPP and (f) JRH-2. The threshold shown for the MPI distributions was determined as explained in the text. Lower panels show the survival time distributions in the respective cohorts (black = 'death/poor outcome', grey = 'censored/good outcome').
Top prognostic genes in ER+ breast cancer
| UniGene symbol | Coefficient sign | Cytoband | GO |
| RACGAP1 | + | 12q13.12 | GTPase activator activity, electron transporter activity |
| STK6 | + | 20q13.2-q13.3 | ATP binding, mitosis, phosphorylation, kinase activity |
| HUMMLC2B | - | 16p11.2 | calcium ion binding, muscle myosin |
| MELK | + | 9p13.2 | ATP binding, phosphorylation, tyrosine kinase activity |
| PPARA | - | 22q12-q13.1 | Transcription factor, steroid hormone activity/lipid metabolism |
| DHCR7 | + | 11q13.2-q13.5 | cholesterol binding and biosynthesis, electron transporter activity |
| MAD2L1 | + | 4q27 | Cell-cycle, mitotic checkpoint, spindle |
| ZWINT | + | 10q21-q22 | Nucleus |
| KIF20A | + | 5q31 | ATP binding, microtubule associated complex |
| CDCA8 | + | 1p34.3 | Cytokinesis |
| KIAA0101 | + | 15q22.31 | |
| TIMELESS | + | 12q12-q13 | Development, negative regulation of transcription |
| PTTG1 | + | 5q35.1 | DNA metabolism, repair, replication and chromosome cycle |
| WSB2 | + | 12q24.23 | Intracellular signaling cascade |
| ABCC5 | + | 3q27 | ATP binding, ATPase activity, transmembrane movement |
| KIF23 | + | 15q23 | ATP binding, microtubule complex/motor activity, mitosis |
| H2AFY | + | 5q31.3-q32 | DNA binding, chromosome organization, nucleosome assembly |
Top ranked 17 prognostic genes in ER+ breast cancer as determined by a meta-analysis of three major breast cancer data sets. We give the sign of their global average Cox-regression coefficient ('+' means upregulated in poor outcome tumors; '-' means downregulated in poor outcome tumors), cytoband position and selected abbreviated Gene Ontology.
Figure 3Kaplan-Meier survival curves in cohorts. Kaplan-Meier survival curves for the two prognostic groups derived from pam-clustering (k = 2) [28] on the molecular prognostic index distribution in the three training cohorts (a) NKI2, (b) EMC and (c) NCH, and three external cohorts (d) JRH-1, (e) UPP and (f) JRH-2. We also give the hazard ratio (HR), the associated 95%CI and the number of events (death or distant metastasis) and number of distinct data points in each prognostic group.
Multivariate D-index analysis
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Opt. | |
| JRH-1 | |||||||||||
| MPI (A) | 0.21 | 0.19 | 0.03 | 0.03 | 0.05 | 0.06 | 0.43 | 0.12 | 0.04 | 0.16 | 0.15 |
| Grade | 0.06 | 0.07 | 0.11 | 0.05 | 0.10 | 0.11 | 0.05 | 0.08 | 0.10 | 0.10 | 0.05 |
| Node status | 0.79 | 0.73 | 0.96 | 0.86 | 0.93 | 0.89 | 0.65 | 0.87 | 0.83 | 0.91 | 0.91 |
| Size | 0.07 | 0.01 | 0.02 | 0.05 | 0.03 | 0.05 | 0.04 | 0.02 | 0.02 | 0.11 | 0.11 |
| MPI (B) | 0.22 | 0.31 | 0.08 | 0.06 | 0.13 | 0.14 | 0.42 | 0.18 | 0.11 | 0.16 | 0.16 |
| NPI | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | 0.01 | <0.005 | <0.005 | <0.005 | 0.01 | 0.01 |
| UPP | |||||||||||
| MPI (A) | 0.01 | 0.01 | 0.06 | 0.01 | 0.02 | 0.01 | <0.005 | 0.01 | 0.01 | 0.01 | 0.01 |
| Grade | 0.72 | 0.84 | 0.85 | 0.90 | 0.98 | 0.99 | 0.73 | 0.83 | 0.85 | 0.88 | 0.83 |
| Node status | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 |
| Size | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 | 0.02 | 0.02 |
| MPI (B) | 0.02 | 0.03 | 0.12 | 0.02 | 0.04 | 0.03 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 |
| NPI | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | 0.01 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 |
| JRH-2 | |||||||||||
| MPI (A) | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 | 0.01 | 0.01 | 0.04 | 0.01 | <0.005 | 0.01 |
| Grade | 0.19 | 0.10 | 0.24 | 0.24 | 0.07 | 0.15 | 0.23 | 0.13 | 0.21 | 0.43 | 0.10 |
| Node status | 0.44 | 0.16 | 0.76 | 0.45 | 0.24 | 0.35 | 0.67 | 0.50 | 0.25 | 0.37 | 0.24 |
| Size | 0.12 | 0.73 | 0.28 | 0.28 | 0.93 | 0.82 | 0.27 | 0.47 | 0.35 | 0.36 | 0.97 |
| MPI (B) | <0.005 | <0.005 | 0.01 | 0.01 | <0.005 | 0.01 | <0.005 | 0.01 | <0.005 | <0.005 | <0.005 |
| NPI* | 0.51 | 0.53 | 0.75 | 0.58 | 0.97 | 0.78 | 0.99 | 0.63 | 0.50 | 0.47 | 0.96 |
Given are the rounded p values (to two significant digits) of the D-indices for two multivariate models, model A is log(h(t)) ~ (Grade) + (NodeStatus) + (TumorSize) + MPIand model-B is log(h(t)) ~ NPI + MPI, in the three external cohorts JRH-1, UPP and JRH-2. Columns label the 10 different derived molecular classifiers, depending on the training-test set partition p used, and the optimal 52-gene classifier. *For JRH-2 only 36 samples with NPI information were available. Opt., optimal.
The prognostic added value of the MPI
| Model | JRH-1 | UPP | JRH-2† |
| Grade | 3.85 | 2.55 | 2.15 |
| Grade + MPI* | 5.85 (2.49-13.72) | 2.85 (1.79-4.51) | 10.60 (2.79-40.20) |
| Grade + MPI** | 4.62 (2.15-9.91) | 2.90 (1.83-4.58) | 8.14 (2.11-31.31) |
| Node Status | 2.64 | 4.03 | 2.36 |
| Node Status + MPI* | 2.98 (1.48-6.01) | 4.71 (2.83-7.86) | 14.07 (2.08-94.84) |
| Node Status + MPI** | 3.09 (1.53-6.23) | 4.40 (2.74-7.06) | 11.79 (1.86-74.44) |
| Size | 4.16 | 3.18 | 3.04 |
| Size + MPI* | 5.40 (2.51-11.62) | 3.41 (2.16-5.38) | 5.21 (2.29-11.84) |
| Size + MPI** | 4.88 (2.35-10.13) | 3.65 (2.29-5.81) | 4.51 (2.08-9.77) |
| NPI | 5.16 | 3.82 | 3.78 |
| NPI + MPI* | 5.85 (2.54-13.47) | 4.02 (2.52-6.41) | 19.11 (2.62-139.3) |
| NPI + MPI** | 4.93 (2.30-10.56) | 4.04 (2.53-6.44) | 25.23 (2.53-251.6) |
For each standard prognostic index SPI (grade, node status, size and NPI) we compare their D-index with the D-index of the corresponding equal-weight hybrid prognostic model, defined by a hybrid prognostic index HPI, where HPI ~ SPI + MPI* or HPI ~ SPI + MPI** (see Materials and methods). 95% CI for the hybrid prognostic model D-index values are shown in brackets. MPI* denotes the index of the optimal 52-gene classifier. MPI** denotes the index of the 17-gene classifier. †For JRH-2 only 36 samples with NPI information were available.