| Literature DB >> 24565120 |
Jie Cheng, Joel Greshock, Leming Shi, Shu Zheng, Alan Menius, Kwan Lee.
Abstract
BACKGROUND: Biomarker discovery holds the promise for advancing personalized medicine as the biomarkers can help match patients to optimal treatment to improve patient outcomes. However, serious concerns have been raised because very few molecular biomarkers or signatures discovered from high dimensional array data can be successfully validated and applied to clinical use. We propose good practice guidelines as well as a novel tool for biomarker discovery and use breast cancer prognosis as a case study to illustrate the proposed approach.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24565120 PMCID: PMC3854673 DOI: 10.1186/1752-0509-7-S4-S2
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Summary of the two data sets involved in this experiment
| Data set | van de Vijver | TRANSBIG |
|---|---|---|
| biomarker discovery | biomarker validation | |
| Agilent Hu25K | Affymetrix HG-U133A | |
| 295 | 198 | |
| <53 | <61; mean = 46 | |
| 226/69 | 134/64 | |
| 144/151 | 0/198 | |
| some | none | |
| http://bioinformatics.nki.nl/data.php | GSE7390 |
Candidate markers identified from the van de Vijver data set using the proposed method
| Group | Sample size n (good prog + poor prog) | Nested CV | ||
|---|---|---|---|---|
| BIRC5, CCNB2, CENPA, TK1, CCNE2, DKFZp762E1312, PRC1, STK15, SLC16A3, BUB1 | CEGP1, SLC11A3, C4A, ZNF145, MATN3, PGR, RAI2, DLX2 | |||
| H1F2, COX6C, H2BFB, CCNE2, BLVRB | FST, DIO3, NTN4, DLX2, MATN3, COL3A1 | |||
| H1F2, H2BFB, HA2FO, H2AFA, HABFB, KFZp762E1312, H2BFS | LTF, NTN4, HML2, PER1, DMBT1, ODZ2, WNT5A, SEMA3C | |||
| PRAME, FADSD6, TK1, TSSC3, CTSL2, BUB1 | CEGP1, ESR1, CYP4B1, SEC14L2, TBX3-iso, ZNF145 | |||
| H1F2, H2BFB, H2AFP, H2AFA, H2BFB, COX6C, MSMB, BLVRB, , BCAS1 | LTF, LAMB3, C4A, NTN4, PTPRK, RTN1 | |||
Many genes discovered in larger groups can also be discovered in their subgroups. For example, BIRC5 can be discovered in most of the subgroups. These genes are not listed again in subgroups unless they are more significant in the subgroups. A gene may be listed in a larger group only because it is significant in one of its subgroups. For example, H1F2 is listed in lymph node-positive group only because it is significant in ER+/Node+ subgroup. The nested CV performance is listed with estimated standard error.
Validation performance (AUROCC) of the candidate biomarkers in TRANSBIG data sets
| Prognostic factors | TRANSBIG | |||
|---|---|---|---|---|
| TDM at 5yrs | TDM at 10 yrs | |||
| Node- | Node-/ER+ | Node- | Node-/ER+ | |
| 202705_at(CCNB2) | ||||
| 209642_at(BUB1) | ||||
| 204962_s_at(CENPA) | ||||
| 203362_s_at(MAD2L1) | ||||
| 202095_s_at(BIRC5) | ||||
| 210074_at(CTSL2) | ||||
| 209803_s_at(PHLDA2, TSSC3) | 0.62 | 0.59 | 0.61 | |
| 202338_at (TK1) | 0.60 | |||
| 204086_at(PRAME) | 0.62 | 0.57 | 0.58 | |
| 202218_s_at (FADSD6) | 0.50 | 0.49 | 0.50 | 0.45 |
| 0.59 | ||||
| 0.59 | ||||
| 0.55 | 0.56 | |||
| 0.62 | 0.58 | 0.59 | ||
| 0.59 | 0.59 | 0.56 | 0.56 | |
| 16-gene signature | ||||
| 70-gene signature | NA | NA | NA | |
| Nottingham Prognostic Index Score | ||||
| Adjuvant! Online 10 year OS prob. | ||||
| 76-gene signature | ||||
| Tumor grade | 0.63 | 0.62 | ||
| Tumor Size | ||||
| 212021_s_at(MKI67) | ||||
| 205225_at (ESR1) | 0.58 | 0.59 | 0.57 | 0.61 |
| Age | 0.53 | 0.47 | 0.52 | 0.51 |
There are two AUROCC numbers for each gene at each endpoint. The first number is from the whole validation set with 100% Node- patients; the second is from the Node-/ER+ subset of the validation set. The numbers in bold font are significant at 95% confidence level. The top portion of the table contains 10 genes of one direction (over expression → poor prognosis). The middle portion contains 10 genes of the opposite direction (over expression → good prognosis). The bottom portion contains our signature based on all 20 genes and other prognostic factors. The performance of the 70-gene signature for the TRANSBIG data set is copied from [8]. The performance of the 76-gene signature is based on binary prediction of "good prognosis" and "poor prognosis" for each patient. Among the listed 20 genes, three genes (CENPA, GSTM3 and CEGP1) were included in the 70-gene signatures [9] and four genes (BIRC5, PGR, SCUBE2 and CTSL2) were included in the 16-gene signature. There is no overlap between our 20-gene signature and the 76-gene signa-ture.
Figure 1Using Kaplan-Meier curves to compare 20-gene signature with the Adjvant! 10 year overall sur-vival score. Solid blue curve is for the risk group that both 20-gene signature and Adjvant! predict good prognosis (n = 38); Dotted blue curve is for the risk group that 20-gene signature predicts good prognosis and Adjvant! predicts poor prognosis(n = 66); Dotted red curve is for the risk group that 20-gene signature predicts poor prognosis and Adjvant! predicts good prognosis(n = 8); Solid red curve is for the risk group that both classifiers predict poor prognosis (n = 86).