| Literature DB >> 36225330 |
Omri Nayshool1,2, Nitzan Kol1, Elisheva Javaski1, Ninette Amariglio1, Gideon Rechavi1,2.
Abstract
Motivation: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset.Entities:
Keywords: Cancer survivors; classification; gene expression; molecular targeted therapy; supervised machine learning
Year: 2022 PMID: 36225330 PMCID: PMC9549197 DOI: 10.1177/11769351221127875
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Samples summary for each TCGA project for total samples in the cohort and samples with RNA-seq data. The Area Under the Curve of Receiver Operating Characteristic curve (AUC-ROC) mean for the last 500 models (500 to 1 features) was calculated for each project. The bold lines are the models that scored averaged AUC-ROC of above 0.8. The CPTAC3-ccRCC and CPTAC3-UCEC data were tested on the selected model with the minimal number of features for each project and the AUC-ROC was calculated respectively.
| TCGA samples | Samples with available RNA-seq data | AUC-ROC average (top 500 models) | AUC-ROC CPTAC3-ccRCC | AUC-ROC CPTAC3-UCEC | Number of final selected model features | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Project | Cancer type | Tumor-free | Deceased | Tumor-free/Total | Tumor-free | Deceased | Tumor-free/Total | ||||
| TCGA-HNSC | Head and neck squamous cell carcinoma | 50 | 170 | 0.23 | 47 | 166 | 0.221 | 0.72 | 0.50 | 0.34 | 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA-LUSC | Lung squamous cell carcinoma | 49 | 161 | 0.23 | 49 | 157 | 0.238 | 0.55 | 0.53 | 0.42 | 29 |
| TCGA-LUAD | Lung adenocarcinoma | 39 | 127 | 0.23 | 36 | 125 | 0.224 | 0.75 | 0.62 | 0.30 | 47 |
| TCGA-READ | Rectum adenocarcinoma | 3 | 9 | 0.25 | 3 | 8 | 0.273 | ||||
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA-LIHC | Liver hepatocellular carcinoma | 32 | 91 | 0.26 | 32 | 88 | 0.267 | 0.67 | 0.66 | 0.49 | 99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA-BLCA | Bladder urothelial carcinoma | 49 | 109 | 0.31 | 48 | 107 | 0.310 | 0.76 | 0.54 | 0.50 | 50 |
| TCGA-SKCM | Skin cutaneous melanoma | 84 | 156 | 0.35 | 84 | 154 | 0.353 | 0.70 | 0.51 | 0.43 | 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA-KIRC | Kidney renal clear cell carcinoma | 159 | 162 | 0.50 | 157 | 157 | 0.500 | 0.79 | 0.77 | 0.65 | 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA-BRCA | Breast invasive carcinoma | 193 | 104 | 0.65 | 191 | 104 | 0.647 | 0.71 | 0.30 | 0.34 | 99 |
| TCGA-UCEC | Uterine corpus endometrial carcinoma | 94 | 45 | 0.68 | 93 | 44 | 0.679 | 0.72 | 0.75 | 0.63 | 19 |
Figure 1.Model generation workflow for each TCGA project.
Figure 2.AUC-ROC results as a function of the features numbers for 3 datasets: TCGA selected model data, CPTAC3-ccRCC, and CPTAC3-UCEC. The datasets were tested on each model and AUC-ROC score was calculated. The blue line represents the average AUC-ROC for all 500 results of the TCGA dataset.
Each data set was tested on top 5 models. The AUC-ROC score was calculated based on the predictions rate for each dataset.
| Prediction model | ||||||
|---|---|---|---|---|---|---|
| CESC | COAD | KIRP | LGG | SARC | ||
|
| ||||||
Red indicates opposite prediction correlation and intensity ranges between 0 to 0.5. Green indicates direct prediction correlation and ranges from 0.5 to 1. White indicates that there is no correlation.
The selected TCGA modes features were analyzed using Ingenuity Pathway Analysis (IPA) software for enriched networks. Only significant results (P-value < 10^−20) are shown. The networks molecules and observed function are shown. For the TCGA-CESC features list, there were 2 matched networks with a different but related function.
| Project model | Models features (order by significance) | Pathway nodes | Pathway score | Molecules in pathways | Pathway function |
|---|---|---|---|---|---|
| TCGA-KIRP | DMBT1, RHEBL1, AC1448313, RP11_665C166, TOX2, IL11, GBP1P1, FAM83D, NLRP9P1, PLCB3, PROX1-AS1, RP3_337H46, HOXB6, FGD5, RP11_83N96, RP11_395L1417, LOC105375267, RP11_627K111, CUX1, ZSWIM1, RP11_214N155, TBL1XR1, MDS2, IL20RB, TPM2, RP11_134K12, LINC01108, RP11_181B111, AC0921714, KRT8P5, PLBD2, RP11_427P53, LZTS3, RP11_466P246, E2F8, TRIB3, LINC01358, GABBR2, RAP2C-AS1, NPAS1, PIM1, RP11_553N16, | Akt, ANGPT4, CCND1, CDKN1B, CSTB, CUX1, DMBT1, E2F8, ERK, ERK1/2, FAM83D, GABBR2, GBP1P1, HAPLN3, Histone h3, HOXB6, IL11, IL20RB, Lh, LRRTM2, MOSPD2, NFkB (complex), PIM1, PLCB3, SMARCA4, STAB2, STAT1, TBL1XR1, TBX21, TMEM204, TOX2, TPM2, TRIB3, USPL1, WNT8B | 10^−38 | 15 | [Cell cycle, connective tissue development and function, renal and urological system development and function] |
| TCGA-CESC | PIP4P2, LOC102724050, P4HA2, LINC01152, HENMT1, RP5_882O71, PLAAT2, RPS19P1, RBM38, PTMAP10, DNAJC9-AS1, DAAM2-AS1, mir-210, EREG, FNDC4, TMEM253, ANKRD37, ARHGEF25, ESM1, NPY1R, SLC10A3, EEF1E1, MMP8, RP11_447H194, RP11_45A174, SERPINH1, ITGA5, FOXC2, CTB_193M123, BAIAP2L1, FUT11, VEGFA, ANXA5, ANKRD20A11P, ENPEP, RP11_89F32, FUNDC2P1, RPSAP5, TMEM120B, UBAC1, COL4A2, LATS2, RP11_378A132, RP11_598F75, PTTG3P, EPN2, SLN, TMEM138, ABHD1, SEPSECS-AS1, SLC19A3, BCORL1, ZNF696, TXNP6, WASHC2A/WASHC2C, CHST14, MATN3, MPRIP, KCND2, RP11_107F64, MMP1, BCO1, SLC35A4, LINC00460, H3P36, APEX2, ANKRD34B, CTRB2, GCOM1, LRRN4, SPON1, RP11_455F54, MMP3, LOC105378645, PTPRB, ITGAD, ITGAV, SCN2A, GLUL, ANGPTL6, | ARHGEF25, BAIAP2L1, CPS1, E2F3, EBAG9, EEF1E1, ENPEP, EPN2, FUT1, FUT11, GLUL, HENMT1, HLA-J, HNF1A, HNMT, ITGAD, ITGB2, LATS2, mir-210, MLF2, MMP8, MPRIP, NINJ1, NR3C1, PLEKHF1, PPP1R14C, RHOA, SCO2, SERPINH1, SMARCA4, SON, SRC, TNF, TP53, YWHAG | 10^−28 | 14 | [Cardiovascular system development and function, organismal injury and abnormalities, reproductive system disease] |
| ABLIM, Akt, ANXA5, CG, COL4A2, EREG, ERK, ERK1/2, ESM1, estrogen receptor, FOXC2, FSH, GNRH, IRS, ITGA5, ITGAV, Jnk, Lh, MAP4K4, MMP1, MMP3, MT3, NFkB (complex), P38 MAPK, P4HA2, Pdgf (complex), PDGF BB, PDXK, POP5, PTPRB, RBM38, REXO5, SRD5A2, TLK1, VEGFA | 10^−25 | 13 | [Dermatological diseases and conditions, inflammatory disease, inflammatory response] | ||
| TCGA-SARC | NMU, B4GALT2, SHOC2, RNU2-22P, MON1B, HNRNPR, ARHGAP28, NDST2, ZFYVE28, ZNF146, SFTA2, RBM48, ARMCX3, ADCY1, QSER1, POR, BTF3L4, LINC01121, ERI3, RP11_680G244, COPG1, BLOC1S6, RAB5B, ELOVL2, DPP9-AS1, HPS6, ISCU, RPL29P19, TENT2, ARMH3, JRKL, NKX6-1, AC241585.1, FEM1AP2, NFKB2, ACOX, ' | ACOX1, ARHGAP28, BLOC1S6, CCND1, CLEC11A, COPG1, DHRS2, ERK1/2, HNRNPR, HPS6, HSPA1L, HSPA2, ISCU, JKAMP, KHDRBS1, LATS, miR-515-3p (and other miRNAs w/seed AGUGCCU), MLF2, MYB, NFkB (complex), NFKB2, NMU, POR, RAB22A, RPTOR, SH3GLB2, SHOC2, SLC9B2, STAB2, TFAP2A, TGM2, TIMM8A, TP53, TRIM6, ZNF146 | 10^−28 | 12 | [Cell cycle, cell death and survival, cellular development] |
| TCGA-LGG | RP11_13N135, RP11_10C243, LOC101929494, SP7, RP11_54O71, MAN1B1-DT, ACTR1A, SPATA17, LOC105372974, AQP7, RP11_394B25, RP11_14C103, HOMER2, LINC01521, RP11_330H66, BEX4, SORT1, RP11_141O111, LOC100287042, BOK-AS1, MGC16275, CTC_498J121, STARD9, TMEM67, LOC101928982, SPINK5, HINT3, INPPL1, NUTM2A-AS1, KLF2, RPL39P36, RP1_196A121, RNU6-453P, AC1456762, LINC00624, ZNF655, LINC02198, RP3_337O189, CNTNAP4, RP11-517H2.6, MTND5P16, ACAD10, LOC728975, FKBP3, PPP2R2B, RP11_299G205, LOC400710, RP11_429P32, RP11_497D63, LINC01270, IGFBP3, DNPEP, RP1_125I34, RP11_312J185, TPST2, MYL2, TAGAP, MIDN, RP11_111F52, FAM171A1, LINC00671, CLDN6, RP11_53I64, RNU6-1196P, RP11_299G202, ZNF514, SDCBP2P1, FAM204A, AP0002556, AC0040143, PRR26, MLLT6, MARCHF5, MAPK8IP1, ADAMTS12, RP11_631N164, SUDS3P1, FIRRE, RP11_9E171, DNAJA3, LOC101929592, DACT3-AS1, TNF, RP11_514P82, TNFRSF11B, IKBI | ADCYAP1, ADGRB1, BCAR3, caspase, CCL27, CLEC11A, DNAJA3, ERK, ERK1/2, FBXO31, FIRRE, FXN, HCG11, Histone h3, Hsp90, HTR4, IGFBP3, IL17RD, KLF2, MAP4K4, MAPK8IP1, NEU1, P2RY6, P38 MAPK, PPP2R2B, RBM17, SLC8A1, SND1-BRAF, SP7, STK10, SYK, SYNPO, TAGAP, TNF, TNFRSF11B | 10^−20 | 10 | [Connective tissue disorders, organismal injury and abnormalities, skeletal and muscular disorders] |
| TCGA-COAD | ZNF266, IZUMO1, ORM2, RPSAP4, DDX50P1, PABPC1P3, ZNF767P, RP5-1056L3.3, U4792429, RALGAPB, RP11_15B245, RP4_665J23, | No significant network |
Figure 3.Mean score results from TCGA-KIRP model on TCGA-KIRP and CPTAC3-ccRCC groups, deceased, and tumor free. †P-value = 2.23 × 10−54. ‡P-value = .01.
Figure 4.Shared features between top networks of TCGA-KIRP prediction models: (A) The top network from IPA prediction for the TCGA-KIRP 300 features model. That network is associated with Cancer, Organismal Injury and Abnormalities, Reproductive System Disease pathways. The gray nodes are the nodes from the model feature list (26 out of 35 network nodes, P-value = 10−42). (B) The top network from IPA prediction for the TCGA-KIRP 42 features model. That network is associated with Cell Cycle, Connective Tissue Development and Function, Renal and Urological System Development and Function. The gray nodes are the nodes from the model feature list (15 out of 35 network nodes, P-value = 10−38). The blue nodes are the shared genes between the networks that are also features in both models: DMBT1, IL11, HOXB6, TRIB3, PIM1.