| Literature DB >> 27322207 |
Xinsen Xu1, Lei Huang2, Chun Hei Chan3, Tao Yu4, Runchen Miao1, Chang Liu1.
Abstract
Cancer molecular profiling provides better understanding of tumor mechanisms and helps to improve the existing cancer management. Here we present the gene expression signatures from ~9000 human tumors with clinical information across 32 malignancies from The Cancer Genome Atlas project (TCGA). Major predictors from the RNA sequencing data that were significantly correlated with cancer survival were identified. The expression level of these prognostic genes revealed significant genomic pathways that were clinically relevant to survival outcomes across human cancers. Furthermore, it is shown that in most cancer types, combinations of these genomic signatures with clinical information might yield improved predictions. Thus, with respect to clinical utility, our study reveals the promising values of genomic data from the pan-cancer perspective.Entities:
Keywords: cancer; expression; genome; prognosis; utility
Mesh:
Substances:
Year: 2016 PMID: 27322207 PMCID: PMC5216771 DOI: 10.18632/oncotarget.10002
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Overview of the computational approach and patient characteristics
A. Flow diagram summarizing the data processing and analysis steps. B. Number of patient samples with survival data, organized by cancer types. C. Median age of the patients in different cancer types. D. Median survival time of the patients in different cancer types (some of the cancer types don't have enough death events to calculate the median survival times, either because of the high survival rates or due to the small sample size of the cancer type). E. Frequency distributions of gender, tumor stage and survival outcome in the whole cancer population.
Figure 2Prognostic landscape of gene expression in the whole cancer population
A. Top ten adverse and favorable pan-cancer prognostic genes were identified in the training group, ranked by the z scores. B. Risk score calculated by the top prognostic genes in the training group patients. Upper panel: risk-score distribution of the training group patients and survival status (blue indicates alive, and red indicates dead). Lower panel: heatmap showing the expression level of the top prognostic genes. C. Box plots of risk scores in different age groups, different gender groups, and different stage groups in the training group patients. D. Forest plot of risk score association with cancer mortality in the training group patients of different stages. E. Kaplan-Meier estimates of overall survival according to the risk score in the training set. F. Box plots of risk scores in different age groups, different gender groups, and different stage groups in the testing group patients. G. Forest plot of risk score association with cancer mortality in the testing group patients of different stages. H. Kaplan-Meier estimates of overall survival according to the risk score in the testing set.
Specific risk scores for different types of cancer
| Cancer Type | Risk Score |
|---|---|
| Whole population (binary) | Score = 0.66*B3GNT5+0.65*SLC11A1+0.65*ELF4+0.65*GALNT2+0.63*PA2G4P4+0.63*SKP2+0.60*S100A9+0.63*FOXM1+0.61*PSMB2+0.64*ARL6IP6-0.63*GPD1L-0.62*SECISBP2-0.58*XPC-0.61*TMED8-0.61*CREBL2-0.64*CRY2-0.64*MAGED2-0.68*CIRBP-0.69*CBX7-0.71*TADA2B |
| Whole population(continuous) | Score = 0.09*B3GNT5+0.17*SLC11A1+0.29*ELF4+0.12*GALNT2+0.18*PA2G4P4+0.16*SKP2+0.08*S100A9+0.14*FOXM1+0.04*PSMB3+0.23*ARL6IP6-0.34*GPD1L-0.35*SECISBP2-0.38*XPC-0.26*TMED8-0.36*CREBL2-0.37*CRY2-0.46*MAGED2-0.54*CIRBP-0.48*CBX7-0.42*TADA2B |
| ACC | Score = 2.48*MASTL+2.38*RECQL4+2.33*PRC1+2.36*KIF11+2.65*AMMECR1L+2.55*TRIP13+2.54*MKI67+2.59*NCAPD3+2.07*E2F1+2.06*FANCI-2.07*APH1B-2.3*CTSA-2.04*UPRT-2.02*HNRNPH2-2.28*NDRG4-2.47*PPFIBP2-1.94*LACTB-2.43*PTGR2-2.11*CHIC1-2.36*BDH2 |
| BLCA | Score = 0.81*SPNS1+0.77*GARS+0.78*NBAS+0.77*IFT122+0.75*NOMO1+0.74*TMX2+0.73*DHRS4+0.74*CCDC28B+0.74*TMEM109+0.74*DAD1-0.86*GATA2-0.77*TRIM26-0.76*MRPS6-0.77*YDJC-0.76*ZNF841-0.75*ZBTB49-0.72*ORMDL1-0.72*DEDD2-0.72*OGT-0.74*CTSH |
| BRCA | Score = 0.88*ZHX1+0.82*PRRC1+0.82*SCRN1+0.81*IARS+0.81*PTPN11+0.83*VPS35+0.78*MRS2+0.77*GRPEL2+0.79*TMEM65+0.76*PGK1-0.95*TNFRSF14-0.95*KDM4B-0.92*INO80B-0.88*LOC150776-0.86*MRPL23-0.87*PYCARD-0.84*ABHD14A-0.82*FGD3-0.84*SEC14L2-0.79*NFKBIA |
| CESC | Score = 1.14*PHRF1+1.09*TNRC18+1.06*ITGA5+0.98*DBN1+1.01*LATS2+1.01*TOR1AIP2+1.02*FASN+1*URGCP+0.95*SRI+0.95*ADAM9-1.28*TREX1-1.21*RBM38-1.07*LGALS9-1.04*HNRNPA3-0.97*NQO2-0.96*ZER1-0.97*ISCU-0.94*MTCP1NB-0.96*AKR1A1-0.98*SLC25A28 |
| CHOL | Score = 1.91*EIF5A+2*CEBPB+1.9*SCO1+1.89*ROM1+2.32*SRI+1.69*FAM54B+1.62*MNAT1+1.58*PSEN1+1.51*PDHB+1.66*SLC38A6-1.63*SCRN1-1.95*PGPEP1-1.83*EIF4ENIF1-1.62*SGSH-1.63*VSIG10-1.49*ACBD5-1.47*PURB-1.61*TNFAIP8-1.57*FUT4-1.38*FGD6 |
| COAD | Score = 1.24*TIAL1+1.14*SMNDC1+1.1*KIAA0907+1.09*POLR2J4+1.03*HSPA1L+0.94*ZBTB25+0.95*UBN2+0.95*SCRN3+0.98*ZBTB9+0.93*DNAJB6-0.99*CPT2-1.01*MRPL37-0.99*ATP8B1-0.96*CCDC149-0.92*EIF2C1-0.9*DYNLL2-0.96*ZCCHC11-0.91*MFN2-1.01*GSR-0.9*SAMM50 |
| DLBC | Score = 1.67*ELP4+1.48*API5+1.48*ARHGEF7+1.48*ATXN7L2+1.48*EXOC5+1.48*GMEB1+1.48*MEMO1+1.48*MPHOSPH10+1.48*MTOR+1.48*NEO1-1.48*TBKBP1-1.48*STXBP2-1.48*PUS1-1.48*PTRH1-1.48*POLR3D-1.48*KCNK6-1.48*IFI35-1.48*GPAA1-1.48*FHL3-1.48*FBXW5 |
| ESCA | Score = 1.13*B3GALTL+1.11*PGK1+1.18*GRPEL2+1.17*MAPRE1+1.03*SRXN1+1.02*LRRC58+0.99*NFATC3+0.96*ST13+0.94*TRMT6+0.92*MLLT11-1.02*UNC13D-0.98*PCSK7-0.94*PLCD3-1.01*DIP2A-0.98*PLEKHM1P-0.89*UNC93B1-0.87*ERAP2-0.84*LRCH4-0.86*CCBL2-0.84*C10orf54 |
| GBMLGG | Score = 1.98*GLA+1.83*KDELC2+2.01*WEE1+1.88*EMP3+1.8*DUSP10+1.84*CLIC1+1.88*TIMP1+1.84*CD58+1.79*DDB2+1.81*SHISA5-2.01*ZRANB1-1.9*GLUD1-1.88*FAM190B-1.78*RAP2A-1.79*ADD1-1.77*HDAC4-1.83*ARL3-1.74*PATZ1-1.79*SCAPER-1.73*RPL7 |
| HNSC | Score = 0.9*PGK1+0.8*USP10+0.78*TOMM34+0.8*SNX6+0.72*TMED2+0.7*PDIA3P+0.69*ADK+0.71*USP14+0.69*TRIM32+0.68*HPRT1-0.75*ZNF266-0.69*ZNF700-0.64*AHCYL2-0.65*SH3BP2-0.65*ZNF577-0.64*ZNF557-0.64*ATXN7L2-0.64*ZNF20-0.63*DUSP16-0.63*CDK3 |
| KICH | Score = 2.36*PNPT1+2.33*PTP4A2+2.31*GPN1+2.31*GPATCH2+2.3*PLEKHA2+2.29*NRAS+2.27*PDS5A+2.27*KDM1B+2.27*TTF2+2.26*NT5DC3-2.36*FIZ1-2.34*TST-2.34*C14orf1-2.34*ELAVL1-2.33*KLHL26-2.31*CES2-2.31*CTDP1-2.31*SUSD1-2.3*USF2-2.3*COPS7A |
| KIRC | Score = 1.28*DONSON+1.24*STRADA+1.2*ATP13A1+1.19*NOP56+1.18*CARS+1.18*ANAPC7+1.16*ANAPC5+1.14*SBNO2+1.15*NCLN+1.18*FKBP11-1.16*SGCB-1.15*PINK1-1.12*FBXO3-1.11*SSFA2-1.1*ITGA6-1.01*HBP1-1*FBXL3-1.02*RNF20-1*PURA-0.98*FBXL5 |
| KIRP | Score = 1.7*GLT25D1+1.48*LMNB2+1.54*SPAG5+1.96*ADA+1.41*PUS7+1.61*CCNF+1.5*RHBDF2+1.7*P4HB+1.58*TSEN15+1.41*AEBP1-1.65*TMCO4-1.49*PGPEP1-1.63*FBXL5-1.51*HTATSF1-1.56*CCDC71-1.56*ACTR8-1.36*CC2D2A-1.42*PARP3-1.39*ZBTB3-1.39*SLC25A11 |
| LAML | Score = 1.1*TOMM40L+0.95*NUP210+0.91*PARP3+0.83*DDIT4+0.83*CLCN5+0.79*FIBP+0.78*RPS6KA1+0.77*PSMA7+0.76*RINL+0.76*PARVB-0.99*PWWP2A-0.97*MBTPS1-0.87*NHLRC3-0.87*LOC646762-0.86*ADSS-0.84*TGIF1-0.81*SIAH1-0.83*DET1-0.8*KCTD15-0.79*FCHSD2 |
| LIHC | Score = 0.97*HNRNPH1+0.81*N4BP3+0.82*LDHA+0.81*ZCRB1+0.84*YBX1+0.78*STK39+0.78*ATP6V1E1+0.8*ANXA5+0.78*HN1+0.76*ATP1B3-0.81*STAT5B-0.79*C9orf3-0.79*CHST14-0.76*SIK2-0.72*POLDIP2-0.73*ATF7IP2-0.72*SLC23A2-0.67*STIM1-0.65*MIA3-0.65*PSD4 |
| LUAD | Score = 0.72*ITGA6+0.73*C1QTNF6+0.72*MTHFD1+0.7*DNAJB4+0.7*BACH1+0.69*CCNA2+0.65*EXT1+0.65*FSCN1+0.66*DNAJB6+0.65*NOC3L-0.91*SLC25A42-0.82*PRKCD-0.79*DBP-0.75*DENND1C-0.71*NRL-0.72*C19orf42-0.73*ALAD-0.71*SLC11A2-0.68*ABAT-0.67*FAM117A |
| LUSC | Score = 0.74*CD14+0.66*ARHGAP1+0.63*CD151+0.62*FSTL3+0.6*RALGAPA2+0.59*CST3+0.57*C11orf2+0.56*SNX29+0.56*FAM109B+0.54*EHD1-0.69*ERH-0.65*NDUFB1-0.59*CBX1-0.56*EMD-0.55*RLIM-0.53*FAM103A1-0.53*MNAT1-0.53*VRK1-0.51*SS18L2-0.5*FKBP3 |
| MESO | Score = 1.71*CDCA8+1.63*KPNA2+1.62*SPAG5+1.54*CCNA2+1.64*IQGAP3+1.66*FOXM1+1.5*HMGB2+1.51*MAD2L1+1.52*CDCA5+1.58*PRC1-1.53*KLHL9-1.44*ETAA1-1.41*THTPA-1.36*HIST1H2BD-1.32*FOXO4-1.39*FBXO44-1.28*HIST1H2AC-1.39*HIST1H2BK-1.3*SH3BGRL-1.36*TMBIM4 |
| OV | Score = 0.6*CBLL1+0.59*CACNA1C+0.56*SOCS5+0.54*ZNF384+0.54*CACNB1+0.53*SEMA4F+0.52*AGPAT6+0.52*CHKA+0.54*GLIS2+0.52*GLCE-0.77*NPEPL1-0.6*TLCD1-0.57*LMO4-0.55*CASP6-0.54*ISG20-0.55*AP4B1-0.53*SAT1-0.52*ZNF326-0.51*ENSA-0.5*AP1S2 |
| PAAD | Score = 1.31*ATG12+1.3*ASCC1+1.33*NFE2L3+1.31*KIAA1609+1.3*CCDC6+1.2*EIF2A+1.26*TMOD3+1.21*AP3S1+1.24*METAP1+1.22*NCK1-1.33*USP20-1.27*MUM1-1.27*REC8-1.24*RBM6-1.21*ARMC5-1.23*DEF8-1.27*KLHL22-1.13*C7orf43-1.14*MGC23284-1.1*ELMOD3 |
| PCPG | Score = 2*GLE1+1.99*EFTUD1+1.99*NARG2+1.98*CIZ1+1.97*ZNF490+1.97*TTC9C+1.96*FAM178A+1.96*ABCA1+1.95*AKAP13+1.95*LOC642852-2.03*HMOX2-1.96*DGCR14-1.96*SLC10A3-1.95*ITFG3-1.94*FAM118A-1.93*MBD3-1.93*USE1-1.92*ICOSLG-1.91*FSCN1-1.91*TMEM167B |
| PRAD | Score = 20.3*EXTL2+20.3*B3GNT5+20.3*SEMA4C+20.3*NUDCD2+20.3*GNAI1+20.3*THUMPD1+20.3*CNNM3+20.3*RNF138+20.3*PRPF4+20.3*FASTKD3-20.3*MRM1-20.3*DAP-20.3*PAOX-20.3*PLA2G15-20.3*SBNO2-20.3*STK19-20.3*CCDC85C-20.3*TBXAS1-20.3*NFATC1-20.3*HSD17B7 |
| READ | Score = 2.26*PSMA3+2.84*PHLPP1+2.91*CNDP2+2.96*CORO1A+2.2*AKR7A2+2.82*SSBP2+2.82*TMEM173+2.7*ATP6V0C+2.7*NFYC+2.79*B4GALT3-2.28*OSGEPL1-2.96*PHF20-3.01*ANKRD27-2.95*ZNF853-2.95*RAPGEF2-2.88*SETD2-2.87*MSH6-2.85*ATM-3.01*SIRT5-2.81*SGK3 |
| SARC | Score = 1.12*RLIM+1.03*BAIAP3+0.97*FUBP1+1*ZNF146+0.99*ATXN10+0.95*LRRC41+0.93*LRRC47+0.9*DOCK7+0.9*ZNF697+0.89*LAPTM4B-1.13*TRIM21-1.08*B3GALT4-1.04*CCDC69-1.01*CCNDBP1-0.97*C14orf159-0.95*GALK2-0.91*PARP14-0.91*ATP2A3-0.86*C15orf24-0.84*PPAP2A |
| SKCM | Score = 0.75*HN1L+0.7*GATAD2A+0.68*NT5DC2+0.66*VDAC1+0.65*KPNA2+0.62*FOXM1+0.62*DCTN2+0.61*CDC25A+0.6*SLC25A3+0.61*SLC25A15-0.81*GBP2-0.77*APOL6-0.77*IFITM1-0.75*FCGR2A-0.74*FAM96A-0.72*PARP9-0.72*APOBEC3F-0.71*NXT2-0.7*UBA7-0.7*APOL1 |
| STAD | Score = 2.33*SLC9A3R2+1.91*ITPRIP+1.74*SOCS2+2.04*C1orf144+1.7*LOC282997+1.69*BMP2K+2.64*VPS52+1.93*UBE4B+1.51*CXCR7+1.9*NDUFA11-2.55*SLC33A1-2.27*TMEM66-2.18*UBA5-2.14*CD47-1.8*C21orf59-2.03*NSF-2.01*FUNDC1-1.97*RAB1A-1.69*C14orf142-1.95*PFDN4 |
| TGCT | Score = 1.03*FAM177A1+1.03*NBR2+1*ATAD2B+0.98*C8orf73+0.96*FMNL2+0.95*CEBPA+0.95*VCPIP1+0.94*C12orf23+0.94*LMBR1L+0.94*ABCC5-1.06*MYO1E-1.03*CABLES1-1*FAM84B-0.99*TOP1-0.98*NCSTN-0.97*NAIF1-0.97*IRS2-0.97*HIBADH-0.97*FUBP3-0.97*PGM1 |
| THCA | Score = 2.06*IQSEC1+1.99*FLYWCH1+1.88*ZHX3+2.12*SEMA6A+1.78*FTO+1.76*LARS+1.74*TGFBR3+2.03*PTEN+1.72*ZNF324+2.72*CEP250-1.96*ANXA1-1.89*SEC14L2-2.18*CIR1-2.17*MED17-2.15*ITGB1BP1-1.86*SRP68-2.14*VAMP8-2.08*PSME2-2.77*RPS27-1.73*CLU |
| THYM | Score = 2.5*RARG+2.45*RBM47+2.39*PELI3+2.39*ATP1B1+2.39*TST+2.35*NUDT16+2.38*DENND1A+2.35*PPAPDC1B+2.34*GNS+2.3*TBC1D16-2.44*ADRBK1-2.43*PDSS1-2.41*SEMA4D-2.4*INTS8-2.4*VRK1-2.4*PTP4A2-2.39*CUTC-2.39*SEMA7A-2.38*SCLT1-2.37*ANKRD27 |
| UCEC | Score = 2.12*TUBB2A+1.92*TAOK3+2.11*ENDOD1+2.05*KLF11+2.06*SYNPO+2.02*BRAF+2.02*SYTL2+2.05*SPAG5+1.72*MCL1+1.73*ARMC1-1.81*SETD6-1.8*LYRM1-1.58*PYCRL-1.58*YDJC-1.71*CRBN-1.94*C15orf29-1.52*PHF5A-2.57*PPA1-1.56*WWOX-1.64*IFT140 |
| UCS | Score = 1.27*S100A10+1.23*PDE4A+1.21*STMN3+1.22*ARL4D+1.18*HIBCH+1.16*FN3K+1.2*SEC23B+1.12*NINJ1+1.16*LOC728554+1.16*CTU1-1.86*CBX5-1.32*DNMT3A-1.31*PSMD7-1.35*PCBP2-1.25*C2orf68-1.18*BUD13-1.17*ZNRF1-1.21*SSRP1-1.19*ST3GAL2-1.22*TUT1 |
| UVM | Score = 2.32*GTF3A+3.14*PSTPIP2+2.27*SPAG1+3.03*SFT2D2+2.23*LIPA+2.2*IMPA1+2.21*JTB+2.16*COQ2+2.93*ALG5+2.97*ISG20-3.14*RABL2B-2.26*C16orf86-2.19*CNP-3.01*C3orf39-2.19*C3orf37-2.17*TBKBP1-2.14*TOM1L2-2.17*RPL32P3-2.89*PPP2R3B-2.16*QRICH1 |
Figure 3Prognostic landscape of pathway scores in the whole cancer population
A, E. Heatmap depicting gene expression levels after unsupervised hierarchical clustering in the training set and testing set, respectively. Expression levels are indicated on a low-to-high scale (green-black-red). Two clusters are defined, namely the high risk group and low risk group. B, F. GSEA analysis was performed in the training set and testing set, respectively, to identify biological pathways associated with survival outcome. FWER-p values are indicated on a low-to high scale (lightblue-darkblue). The number of significant genes in each gene set is indicated by the circle size. C, G. Forest plots of pathway score association with cancer mortality in the training set and testing set, respectively. D, H. Scatter plots of correlations between risk scores and the E2F pathway scores in the training set and testing set, respectively.
Figure 4C-indexes by models trained from individual gene expression data alone or in combination with clinical variables
A. C-indexes calculated from the ACC, BLCA, BRCA, CESC, COAD, ESCA, GBMLGG, HNSC, KIRC and KIRP. B. C-indexes calculated from the LAML, LIHC, LUAD, LUSC, MESO, OV, PAAD, SARC, SKCM and UCS. The lightblue box indicates the model built from individual gene expression data alone, and the darkblue box indicates the model built from the combination of gene expression data and clinical variables.