| Literature DB >> 23874744 |
Wayne Xu1, Shantanu Banerji, James R Davie, Fekadu Kassie, Douglas Yee, Robert Kratzke.
Abstract
Many studies have established gene expression-based prognostic signatures for lung cancer. All of these signatures were built from training data sets by learning the correlation of gene expression with the patients' survival time. They require all new sample data to be normalized to the training data, ultimately resulting in common problems of low reproducibility and impracticality. To overcome these problems, we propose a new signature model which does not involve data training. We hypothesize that the imbalance of two opposing effects in lung cancer cells, represented by Yin and Yang genes, determines a patient's prognosis. We selected the Yin and Yang genes by comparing expression data from normal lung and lung cancer tissue samples using both unsupervised clustering and pathways analyses. We calculated the Yin and Yang gene expression mean ratio (YMR) as patient risk scores. Thirty-one Yin and thirty-two Yang genes were identified and selected for the signature development. In normal lung tissues, the YMR is less than 1.0; in lung cancer cases, the YMR is greater than 1.0. The YMR was tested for lung cancer prognosis prediction in four independent data sets and it significantly stratified patients into high- and low-risk survival groups (p = 0.02, HR = 2.72; p = 0.01, HR = 2.70; p = 0.007, HR = 2.73; p = 0.005, HR = 2.63). It also showed prediction of the chemotherapy outcomes for stage II & III. In multivariate analysis, the YMR risk factor was more successful at predicting clinical outcomes than other commonly used clinical factors, with the exception of tumor stage. The YMR can be measured in an individual patient in the clinic independent of gene expression platform. This study provided a novel insight into the biology of lung cancer and shed light on the clinical applicability.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23874744 PMCID: PMC3714286 DOI: 10.1371/journal.pone.0068742
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Identification and selection of Yin and Yang genes.
A. Clustering of gene identification. The probe sets are in rows and the samples are in columns. The expression indexes of all the 12,625 probe sets of the 100 samples were summarized by RMA algorithm and further normalized by itemwise Z-normalization. 74 upregulated genes (bottom half rows) and 108 (top half rows) down regulated genes in cancer tissues were selected from the 2D clustering regions. The preselected 74 and 108 probsets were displayed by clustering again. B. Yin (bottom) and Yang (top) genes selection by functional analysis. The two circles represent the two cores of functional effects of the Yin and the Yang. The genes highlighted by the same color are in the same interaction network.
Yin genes.
| HG_U95A probe set | Gene Symbol | Gene Title |
| 31711_at | GRIN2D | glutamate receptor, ionotropic, N-methyl D-aspartate 2D |
| 34552_at | GAST | gastrin |
| 35084_at | AMH | anti-Mullerian hormone |
| 32874_at | TCF3 | transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47) |
| 32975_g_at | EXOSC2 | exosome component 2 |
| 33510_s_at | GRM1 | glutamate receptor, metabotropic 1 |
| 34027_f_at | HIST1H4J///HIST1H4K | histone cluster 1, H4j///histone cluster 1, H4k |
| 34510_at | CDT1 | chromatin licensing and DNA replication factor 1 |
| 37432_g_at | PIAS2 | protein inhibitor of activated STAT, 2 |
| 39651_at | RECQL4 | RecQ protein-like 4 |
| 40334_at | CSTF2 | cleavage stimulation factor, 3' pre-RNA, subunit 2, 64 kDa |
| 41623_s_at | FZR1 | fizzy/cell division cycle 20 related 1 (Drosophila) |
| 34664_at | FCGR2B | Fc fragment of IgG, low affinity IIb, receptor (CD32) |
| 35141_at | RNASEH2A | ribonuclease H2, subunit A |
| 36839_at | CDC6 | cell division cycle 6 homolog (S. cerevisiae) |
| 37267_at | THOP1 | thimet oligopeptidase 1 |
| 41149_at | LOC81691 | exonuclease NEF-sp |
| 33935_at | CACYBP | calcyclin binding protein |
| 34341_at | PPAT | phosphoribosyl pyrophosphate amidotransferase |
| 35800_at | PAFAH1B3 | platelet-activating factor acetylhydrolase 1b, catalytic subunit 3 (29 kDa) |
| 39909_g_at | TAF6L | TAF6-like RNA polymerase II, p300/CBP-associated factor (PCAF)-associated factor, 65 kDa |
| 40264_g_at | ZFPL1 | zinc finger protein-like 1 |
| 40532_at | BIRC5 | baculoviral IAP repeat-containing 5 |
| 1738_at | CDC25A | cell division cycle 25 homolog A (S. pombe) |
| 1601_s_at | IGFBP5 | insulin-like growth factor binding protein 5 |
| 1539_at | NRAS | neuroblastoma RAS viral (v-ras) oncogene homolog |
| 1133_at | EN2 | engrailed homeobox 2 |
| 966_at | RAD54L | RAD54-like (S. cerevisiae) |
| 895_at | MIF | macrophage migration inhibitory factor (glycosylation-inhibiting factor) |
| 652_g_at | RPA3 | replication protein A3, 14 kDa |
| 374_f_at | DDT///DDTL | D-dopachrome tautomerase///D-dopachrome tautomerase-like |
Yang genes.
| HG_U95A Probe set | Gene Symbol | Gene Title |
| 34174_s_at | LPHN2 | latrophilin 2 |
| 36377_at | IL18R1 | interleukin 18 receptor 1 |
| 32904_at | PRF1 | perforin 1 (pore forming protein) |
| 34950_at | ZNF423 | zinc finger protein 423 |
| 37841_at | BCHE | butyrylcholinesterase |
| 39209_r_at | PPBP | pro-platelet basic protein (chemokine (C-X-C motif) ligand 7) |
| 39577_at | SOSTDC1 | sclerostin domain containing 1 |
| 39634_at | SLIT2 | slit homolog 2 (Drosophila) |
| 40322_at | IL1RL1 | interleukin 1 receptor-like 1 |
| 40398_s_at | MEOX2 | mesenchyme homeobox 2 |
| 41030_at | FOXJ1 | forkhead box J1 |
| 34267_r_at | LEPR | leptin receptor |
| 37194_at | GATA2 | GATA binding protein 2 |
| 37536_at | CD83 | CD83 molecule |
| 38315_at | ALDH1A2 | aldehyde dehydrogenase 1 family, member A2 |
| 39048_at | NOTCH4 | Notch homolog 4 (Drosophila) |
| 39085_at | TNNC1 | troponin C type 1 (slow) |
| 40763_at | MEIS1 | Meis homeobox 1 |
| 35828_at | CRIP2 | cysteine-rich protein 2 |
| 37710_at | MEF2C | myocyte enhancer factor 2C |
| 38734_at | PLN | phospholamban |
| 39544_at | SYNM | synemin, intermediate filament protein |
| 40231_at | SMAD6 | SMAD family member 6 |
| 40900_at | MYH10 | myosin, heavy chain 10, non-muscle |
| 2039_s_at | FYN | FYN oncogene related to SRC, FGR, YES |
| 1733_at | BMP6 | bone morphogenetic protein 6 |
| 1595_at | TEK | TEK tyrosine kinase, endothelial |
| 873_at | HOXA5 | homeobox A5 |
| 774_g_at | MYH11 | myosin, heavy chain 11, smooth muscle |
| 610_at | ADRB2 | adrenergic, beta-2-, receptor, surface |
| 560_s_at | TAL1 | T-cell acute lymphocytic leukemia 1 |
| 481_at | SNRK | SNF related kinase |
Figure 2Boxplot of the YMR distributions in normal lung samples and lung cancer samples.
Microarray gene expression data sets from different reports with different platforms were used. The data sets were described as in Table S7.
Figure 3Validation of YMR in four data sets by Kaplan-Meier estimates of the survivor function.
A. Free-recurrence time function curve (low risk n = 60; high risk n = 65) of the adenocarcinomas patients from Bhattacharjee et al. B. Overall survival time function curve of the adenocarcinomas patients (low risk n = 27; high risk n = 31) from Bild et al. C. Patient samples (low risk n = 248; high risk n = 194) of the DCC project. D. RNA-seq samples (low risk n = 121; high risk n = 137) from TCGA. Low YMR scores (in green) correspond to the highest predicted survival probability and high YMR scores (in red) correspond to the greatest predicted risk.
Comparison of YMR to different signature models previously reported.
| Report | Gene selection reproducibility affected by other cause of survival or false correlation | Signature modeling reproducibility affected by training set intrinsic features | Simplicity of signature delivery | applicable to individual without normalization |
| Gordon, 2002 |
|
| Simple. Two-gene ratio |
|
| Raponi, 2006 |
|
| Simple equation with trained Cox coefficient |
|
| Chen, 2007 |
|
| Simple equation with trained Cox coefficient |
|
| Shedden, 2008 Method A |
|
| Complex equation with trained likelihood score |
|
| Zhu, 2010 |
|
| Moderate complex equation with trained coefficients |
|
| Wan, 2010 |
|
| Complex Bayesian equation with trained prior value |
|
| Lu, 2012 |
|
| Moderate complex equation with trained coefficients |
|
|
|
|
|
|
|
Figure 4Kaplan-Meier estimates of the survivor function of the gYMR signature in different group of patients of the DCC data set.
A. Stage I only (low risk n = 122; high risk n = 177). B. Stage I who received chemotherapy (low risk n = 13; high risk n = 28). C. Stage I who did not receive chemotherapy (low risk n = 79; high risk n = 95). D. Stage II & III only (low risk n = 63; high risk n = 78). E. Stage II & III who received chemotherapy (low risk n = 24; high risk n = 23). F. Stage II & III who did not receive chemotherapy (low risk n = 27; high risk n = 31). Low gYMR scores (in green) correspond to the highest predicted survival probability and high gYMR scores (in red) correspond to the greatest predicted risk.