| Literature DB >> 18416850 |
Seon-Young Kim1, Yong Sung Kim.
Abstract
BACKGROUND: Gene expression profiling is a promising approach to better estimate patient prognosis; however, there are still unresolved problems, including little overlap among similarly developed gene sets and poor performance of a developed gene set in other datasets.Entities:
Mesh:
Year: 2008 PMID: 18416850 PMCID: PMC2364634 DOI: 10.1186/1471-2164-9-177
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Breast cancer datasets analyzed in this study
| Study | Platform | Samples | Data source |
| Bild | Affymetrix | 169 | *GSE3143 |
| Miller | Affymetrix | 251 | GSE3494 |
| Oh | Oligos Agilent | 67 | |
| Pawitan | Affymetrix | 159 | GSE1456 |
| Sorlie_1 | Spotted cDNA | 76 | GSE3193 |
| Sorlie_2 | Spotted cDNA | 39 | |
| Sotiriou_1 | Spotted cDNA | 99 | |
| Sotiriou_2 | Affymetrix | 187 | GSE2990 |
| Van de Vijver | oligos Agilent | 295 | |
| Wang | Affymetrix | 286 | GSE2034 |
| Weigelt | Oligos Agilent | 79 | |
| West | Affymetrix | 49 | |
| Total | 1756 | ||
*GSE: gene expression series number in GEO (gene expression omnibus)
Number of gene sets in each category
| Category | Number |
| GO Biological Process (BP) | 735 |
| GO Molecular Functions (MF) | 648 |
| Biological Pathways | 198 |
| InterPro Domains | 798 |
| Breast and other Cancer Signatures | 32 |
| Total | 2411 |
Thirty-two prognostic gene sets prepared from published reports
| Gene set | Number (reported) | Number (unique) | Reference |
| *11823860_ST2 | 231 | 164 | van't Veer et al. [13] |
| 11823860_ST3 | 2,460 | 1,818 | van't Veer et al. [13] |
| 11823860_ST4 | 430 | 314 | van't Veer et al. [13] |
| 12490681_70 | 70 | 50 | van de Vijver [1] |
| 12747878_ST2 | 177 | 144 | Huang et al. [52] |
| 12747878_ST3 | 168 | 160 | Huang et al. [52] |
| 12917485_ST6 | 606 | 564 | Sotiriou et al. [18] |
| 12917485_ST7 | 137 | 126 | Sotiriou et al. [18] |
| 12917485_ST8 | 706 | 635 | Sotiriou et al. [18] |
| 12917485_ST9 | 485 | 402 | Sotiriou et al. [18] |
| 14737219_CSR | 512 | 459 | Chang et al. [3] |
| 14737219_USR | 677 | 611 | Chang et al. [3] |
| 15034139_T2 | 45 | 31 | Zhao et al. [53] |
| 15073102_4 | 4 | 4 | Glinsky et al. [54] |
| 15073102_6 | 6 | 6 | Glinsky et al. [54] |
| 15073102_13 | 12 | 12 | Glinsky et al. [54] |
| 15073102_14 | 14 | 14 | Glinsky et al. [54] |
| 15591335_F1 | 21 | 21 | Paik et al. [6] |
| 15721473_T3 | 76 | 68 | Wang et al. [2] |
| 15931389_T3_stem | 11 | 11 | Glinsky et al. [55] |
| 15931389_ST2_14 | 14 | 14 | Glinsky et al. [55] |
| 15931389_ST2_CNS | 11 | 11 | Glinsky et al. [55] |
| 16141321_SDC2 | 500 | 398 | Miller et al. [19] |
| 16273092_catenin | 98 | 76 | Bild et al. [20] |
| 16273092_E2F3 | 298 | 238 | Bild et al. [20] |
| 16273092_myc | 332 | 192 | Bild et al. [20] |
| 16273092_RAS | 348 | 248 | Bild et al. [20] |
| 16273092_SRC | 75 | 58 | Bild et al. [20] |
| 16280042_AF1 | 64 | 61 | Pawitan et al. [16] |
| 16478745_ST1 | 242 | 207 | Sotiriou et al. [15] |
| 16707453_ST3 | 101 | 86 | Schuetz et al. [56] |
| 17076897_ADF3 | 52 | 52 | Teschendorff et al. [24] |
*Eight-digit number represents PubMed id of a reference
Top 20 prognostic gene sets identified by two-means clustering in breast cancer gene expression datasets
| Gene set | *category | Bild | Miller | Oh | Pawitan | Sorlie_1 | Sorlie_2 | Sotiriou_1 | Sotiriou_2 | van de Vijver | Wang | Weigelt | West | #freq | %mean |
| 11823860_ST2 | BR | 1.32 | 7.21 | 10.02 | 22.68 | 8.87 | 0.44 | 4.51 | 8.18 | 45.51 | 8.19 | 0 | 0.97 | 8 | 9.83 |
| mitotic checkpoint | BP | 7.51 | 13.34 | 2.91 | 13.57 | 0.07 | 0.03 | 4.08 | 9.59 | 30.78 | 12.49 | 0.01 | 3.57 | 7 | 8.16 |
| Cell_cycle_KEGG_GenMAPP | PW | 7.2 | 12.05 | 2.08 | 11.46 | 4.28 | 0.31 | 2.75 | 9.33 | 40.26 | 6.93 | 0.01 | 0.03 | 7 | 8.06 |
| cell division | BP | 4.37 | 10.47 | 3.46 | 13.81 | 6.05 | 0 | 2.14 | 7.69 | 32.14 | 15.18 | 0.02 | 0.06 | 7 | 7.95 |
| cation efflux protein | IP | 7.94 | 9.69 | 2.16 | 15.77 | 4.16 | 2.41 | 1.96 | 10.45 | 24.69 | 10.04 | 0.51 | 0.21 | 7 | 7.5 |
| cyclin, C-terminal | IP | 3.88 | 15.25 | 6.72 | 16.84 | 5.65 | 0.07 | 2.64 | 3.84 | 21.12 | 7.7 | 0.69 | 0.23 | 7 | 7.05 |
| DNA repair | BP | 2.04 | 7.15 | 4.58 | 9.13 | 0.09 | 0.02 | 6.61 | 8.4 | 35.15 | 6.97 | 0.13 | 1.53 | 7 | 6.82 |
| cyclin, N-terminal domain | IP | 3.4 | 15.66 | 2.50 | 10.93 | 5.5 | 0.3 | 4.37 | 3.91 | 26.72 | 7.28 | 1.03 | 0.03 | 7 | 6.8 |
| protein tyrosine phosphatase activity | MF | 9.45 | 4.03 | 8.29 | 9.19 | 4.51 | 0.55 | 2.55 | 0.46 | 24.1 | 9.25 | 0 | 2.84 | 7 | 6.27 |
| protein domain specific binding | MF | 6.56 | 5.32 | 0 | 10.73 | 0.14 | 1.24 | 12.14 | 6.81 | 15.53 | 2.69 | 10.09 | 0.66 | 7 | 5.99 |
| DNA metabolism | BP | 4.08 | 8.01 | 1.81 | 10.15 | 0.06 | 0.17 | 5.06 | 4.88 | 26.2 | 8.88 | 0.64 | 0.3 | 7 | 5.85 |
| identical protein binding | MF | 0.18 | 8.04 | 8.35 | 9.55 | 0.12 | 5.3 | 8.79 | 4.09 | 19.64 | 0.01 | 0.97 | 0.12 | 7 | 5.43 |
| water transport | BP | 0.01 | 10.23 | 5.85 | 5.26 | 0.45 | 5.25 | 1.57 | 0.42 | 3.97 | 0.91 | 6.13 | 5.14 | 7 | 3.77 |
| 17076897_ADF3 | BR | 3.15 | 14.49 | 4.06 | 19.84 | 0.1 | 2.59 | 2.31 | 12.03 | 48.93 | 18.04 | 0 | 2.39 | 6 | 10.66 |
| mitosis | BP | 6.45 | 13.81 | 2.22 | 16.95 | 1.35 | 0.1 | 2.23 | 9.05 | 37.87 | 11.43 | 0 | 0.16 | 6 | 8.47 |
| 16478745_ST1 | BR | 5.49 | 10.52 | 3.2 | 13.32 | 1.04 | 0 | 2.55 | 11.64 | 39.05 | 12.51 | 0.29 | 0.53 | 6 | 8.35 |
| Pyrimidine metabolism_KEGG | PW | 4.28 | 7.84 | 4.35 | 25.6 | 0.61 | 0.46 | 2.07 | 8.04 | 42.75 | 3.12 | 0 | 0.77 | 6 | 8.32 |
| 14737219_USR | BR | 4.27 | 10.25 | 3.72 | 13.61 | 0.99 | 0.06 | 1.96 | 10.86 | 39.37 | 11.4 | 0.16 | 0.2 | 6 | 8.07 |
| cytokinesis | BP | 2.6 | 8.9 | 3.91 | 17.06 | 0.03 | 0.91 | 0.22 | 8.68 | 48.68 | 5.24 | 0.16 | 0.17 | 6 | 8.05 |
| 14737219_CSR | BR | 0.51 | 10.7 | 3.13 | 15.5 | 7.45 | 0.08 | 4.25 | 1.43 | 39.65 | 7.32 | 0.38 | 2.3 | 6 | 7.73 |
Values are chi-square values from log-rank test.
#frequency: The number of cases in which chi-square value is over 3.84
*category: BP-GO Biological Processes, BR-Breast cancer prognostic signatures, MF-GO Molecular Function, PW-KEGG and GenMAPP pathways, IP-InterPro domains
%mean: Mean of 12 chi-square values
Figure 1Kaplan-Meier survival curves for the two prognostic classes of breast cancers. In each dataset, patients were divided into two groups (poor and good prognostic groups) based on the gene expression pattern in the 11823860_ST2 gene set, and their survival or recurrence proportions were then plotted. The log-rank test was used to infer the statistical significance of survival or recurrence differences between the two groups. In each graph, the x-axis represents overall or relapse-free survival years and the y-axis represents the proportion of overall survival (A, B, C, D, E, F, I, and K) or relapse-free survival (G, H, J, and L). Black indicates poor prognosis and red indicates good prognosis.
Hazard ratios and P values for the top three gene signatures in 12 datasets
| Datasets | 11823860_ST2 | Mitotic checkpoint | Cell_cycle_KEGG |
| Bild | 2.88 (0.686–12.1) p = 0.148 | 1.13 (0.407–3.11) p = 0.819 | |
| Miller | 1.29 (0.297–5.63) p = 0.731 | 0.942 (0.269–3.3) p = 0.925 | 1.37 (0.547–3.41) p = 0.504 |
| Oh | 4.72 (0.834–26.7) p = 0.0794 | 3.87 (0.792–18.9) p = 0.0944 | 2.07 (0.728–5.9) p = 0.172 |
| Pawitan | |||
| Sorlie_1 | |||
| Sorlie_2 | 3.28 (0.29–46.9) p = 0.381 | 1.99 (0.308–12.8) p = 0.471 | 1.33 (0.319–5.57) p = 0.695 |
| Sotiriou_1 | 27.3 (2.60–287) p = 0.0582 | ||
| Sotiriou_2 | |||
| Van de Vijver | |||
| Wang | |||
| Wiegelt | 2.00 (0.152–26.0) p = 0.597 | 1.40 (0.15–13.0) p = 0.769 | 1.25 (0.19–3.38) p = 0.764 |
| West | 15.5 (0.73–329) p = 0.788 | 5.56 (0.635–12.1) p = 0.121 |
*Values in parenthesis are 95% confidence intervals
#Bolded data entries are significant at P < 0.05.
Prediction accuracy of the 11823860_ST2 gene set in external validation
| training | testing | *GTG | GTP | PTG | PTP | **accuracy | sensitivity | specificity |
| Bild | Miller | 128 | 49 | 17 | 19 | 0.6901 | 0.7232 | 0.5278 |
| Bild | Pawitan | 89 | 41 | 7 | 15 | 0.6842 | 0.6846 | 0.6818 |
| Bild | Sotiriou_2 | 85 | 32 | 11 | 17 | 0.7034 | 0.7265 | 0.6071 |
| Bild | Van de Vijver | 165 | 67 | 11 | 37 | 0.7214 | 0.7112 | 0.7708 |
| Bild | Wang | 128 | 55 | 42 | 51 | 0.6486 | 0.6995 | 0.5484 |
| Miller | Bild | 37 | 24 | 17 | 17 | 0.5684 | 0.6066 | 0.5 |
| Miller | Pawitan | 84 | 46 | 6 | 16 | 0.6579 | 0.6462 | 0.7273 |
| Miller | Sotiriou_2 | 77 | 40 | 7 | 21 | 0.6759 | 0.6581 | 0.75 |
| Miller | Van de Vijver | 165 | 67 | 11 | 37 | 0.7214 | 0.7112 | 0.7708 |
| Miller | Wang | 125 | 58 | 42 | 51 | 0.6377 | 0.6831 | 0.5484 |
| Pawitan | Bild | 43 | 18 | 19 | 15 | 0.6105 | 0.7049 | 0.4412 |
| Pawitan | Miller | 133 | 44 | 19 | 17 | 0.7042 | 0.7514 | 0.4722 |
| Pawitan | Sotiriou_2 | 87 | 30 | 11 | 17 | 0.7172 | 0.7436 | 0.6071 |
| Pawitan | Van de Vijver | 173 | 59 | 12 | 36 | 0.7464 | 0.7457 | 0.75 |
| Pawitan | Wang | 135 | 48 | 51 | 42 | 0.6413 | 0.7377 | 0.4516 |
| Sotiriou_2 | Bild | 38 | 23 | 18 | 16 | 0.5684 | 0.623 | 0.4706 |
| Sotiriou_2 | Miller | 129 | 48 | 19 | 17 | 0.6854 | 0.7288 | 0.4722 |
| Sotiriou_2 | Pawitan | 86 | 44 | 10 | 12 | 0.6447 | 0.6615 | 0.5455 |
| Sotiriou_2 | Van de Vijver | 164 | 68 | 12 | 36 | 0.7143 | 0.7069 | 0.75 |
| Sotiriou_2 | Wang | 131 | 52 | 43 | 50 | 0.6558 | 0.7158 | 0.5376 |
| Van de Vijver | Bild | 41 | 20 | 21 | 13 | 0.5684 | 0.6721 | 0.3824 |
| Van de Vijver | Miller | 136 | 41 | 21 | 15 | 0.7089 | 0.7684 | 0.4167 |
| Van de Vijver | Pawitan | 99 | 31 | 12 | 10 | 0.7171 | 0.7615 | 0.4545 |
| Van de Vijver | Sotiriou_2 | 88 | 29 | 15 | 13 | 0.6966 | 0.7521 | 0.4643 |
| Van de Vijver | Wang | 141 | 42 | 54 | 39 | 0.6522 | 0.7705 | 0.4194 |
| Wang | Bild | 34 | 27 | 16 | 18 | 0.5474 | 0.5574 | 0.5294 |
| Wang | Miller | 123 | 54 | 14 | 22 | 0.6808 | 0.6949 | 0.6111 |
| Wang | Pawitan | 81 | 49 | 6 | 16 | 0.6382 | 0.6231 | 0.7273 |
| Wang | Sotiriou_2 | 76 | 41 | 7 | 21 | 0.669 | 0.6496 | 0.75 |
| Wang | Van de Vijver | 154 | 78 | 8 | 40 | 0.6929 | 0.6638 | 0.8333 |
| Total | 3175 | 1325 | 559 | 746 | 0.6755 | 0.7056 | 0.5716 | |
*GTG – Good prognosis group predicted as Good; GTP – Good prognosis group predicted as Poor; PTG – Poor prognosis group predicted as Good; PTP – Poor prognosis group predicted as poor
**accuracy = (GTG+PTP)/(GTG+GTP+PTG+PTP); sensitivity = GTG/(GTG+GTP); specificity = PTP/(PTG+PTP)
Top 20 gene sets with high prediction accuracy (analysis with six datasets)
| Gene set | category | GTG | GTP | PTG | PTP | accurary | sensitivity | specificity |
| 11823860_ST2 | br | 3175 | 1325 | 559 | 746 | 0.6755 | 0.7056 | 0.5716 |
| transferase activity | mf | 3264 | 1236 | 658 | 647 | 0.6737 | 0.7253 | 0.4958 |
| ligase activity | mf | 3204 | 1296 | 633 | 672 | 0.6677 | 0.712 | 0.5149 |
| 11823860_ST3 | br | 3200 | 1300 | 632 | 673 | 0.6672 | 0.7111 | 0.5157 |
| transcription factor activity | mf | 3268 | 1232 | 701 | 604 | 0.667 | 0.7262 | 0.4628 |
| 16141321_SDC2 | br | 3169 | 1331 | 607 | 698 | 0.6661 | 0.7042 | 0.5349 |
| oxidoreductase activity | mf | 3209 | 1291 | 648 | 657 | 0.666 | 0.7131 | 0.5034 |
| 14737219_CSR | br | 3165 | 1335 | 606 | 699 | 0.6656 | 0.7033 | 0.5356 |
| 12917485_ST9 | br | 3162 | 1338 | 611 | 694 | 0.6643 | 0.7027 | 0.5318 |
| catalytic activity | mf | 3209 | 1291 | 661 | 644 | 0.6637 | 0.7131 | 0.4935 |
| RNA polymerase II transcription factor activity | mf | 3235 | 1265 | 689 | 616 | 0.6634 | 0.7189 | 0.472 |
| transport | bp | 3186 | 1314 | 645 | 660 | 0.6625 | 0.708 | 0.5057 |
| transcription | bp | 3241 | 1259 | 701 | 604 | 0.6624 | 0.7202 | 0.4628 |
| transporter activity | mf | 3171 | 1329 | 631 | 674 | 0.6624 | 0.7047 | 0.5165 |
| 14737219_USR | br | 3094 | 1406 | 555 | 750 | 0.6622 | 0.6876 | 0.5747 |
| 12917485_ST7 | br | 3140 | 1360 | 602 | 703 | 0.662 | 0.6978 | 0.5387 |
| ATP binding | mf | 3185 | 1315 | 647 | 658 | 0.662 | 0.7078 | 0.5042 |
| kinase activity | mf | 3205 | 1295 | 669 | 636 | 0.6617 | 0.7122 | 0.4874 |
| metabolism | bp | 3199 | 1301 | 666 | 639 | 0.6612 | 0.7109 | 0.4897 |
| regulation of progression through cell cycle | bp | 3108 | 1392 | 575 | 730 | 0.6612 | 0.6907 | 0.5594 |
*category: br – breast and other cancer gene set; mf – molecular functions; bp – biological processes
**GTG – Good prognosis group predicted as Good; GTP – Good prognosis group predicted as Poor; PTG – Poor prognosis group identified as Good; PTP – Poor prognosis group identified as Poor
^ accuracy = (GTG+ PTP)/(GTP+GTP+PTG+PTP); sensitivity = GTG/(GTG+GTP); specificity = PTP/(PTG+PTP)
Figure 2Comparison of gene set sizes between best prognostic gene sets (group 1) and best gene predictive sets (group 2). The number of genes in top 20 gene sets for group discrimination (PROG) and top 20 gene sets for prediction accuracy (PRED) is box plotted. P-value was inferred from an unpaired t-test.