| Literature DB >> 32973875 |
Xinghao Yu1, Ting Wang1, Shuiping Huang1,2, Ping Zeng1,2.
Abstract
BACKGROUND: Previous cancer prognostic prediction models often consider only the most important transcriptomic expressions, and their power is limited. It is unknown whether prediction power can be further improved when additional transcriptomic information is incorporated.Entities:
Keywords: Cox model; gene expression; linear mixed model; prognostic prediction; regularization method; the Cancer Genome Atlas
Year: 2020 PMID: 32973875 PMCID: PMC7472843 DOI: 10.3389/fgene.2020.00920
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Basic information of the raw data sets and quality control for 32 cancers in TCGA used in the present study.
| Cancer | Initial data | After combined ( | After quality control ( | |||
| Gene expression ( | Clinical ( | |||||
| Adrenocortical Carcinoma (ACC) | 79 | 20,530 | 92 | 79 | 77 | 19,194 |
| Bladder Urothelial Carcinoma (BLCA) | 426 | 20,530 | 436 | 425 | 400 | 20,164 |
| Breast Invasive Carcinoma (BRCA) | 1,218 | 20,530 | 1247 | 1215 | 901 | 20,131 |
| Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) | 308 | 20,530 | 313 | 307 | 287 | 19,556 |
| Cholangiocarcinoma (CHOL) | 45 | 20,530 | 45 | 45 | 36 | 19,352 |
| Colon Adenocarcinoma (COAD) | 329 | 20,530 | 551 | 328 | 270 | 18,707 |
| Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC) | 48 | 20,530 | 48 | 48 | 41 | 18,125 |
| Esophageal Carcinoma (ESCA) | 196 | 20,530 | 204 | 196 | 159 | 19,501 |
| Glioblastoma Multiforme (GBM) | 172 | 20,530 | 629 | 172 | 143 | 17,996 |
| Head and Neck Squamous Cell Carcinoma (HNSC) | 566 | 20,530 | 604 | 566 | 440 | 19,526 |
| Kidney Chromophobe (KICH) | 91 | 20,530 | 91 | 91 | 65 | 18,104 |
| Kidney Renal Clear Cell Carcinoma (KIRC) | 606 | 20,530 | 945 | 606 | 526 | 19,212 |
| Kidney Renal Papillary Cell Carcinoma (KIRP) | 323 | 20,530 | 352 | 323 | 256 | 19,309 |
| Acute Myeloid Leukemia (LAML) | 173 | 20,530 | 200 | 173 | 161 | 16,718 |
| Brain Lower Grade Glioma (LGG) | 530 | 20,530 | 530 | 530 | 502 | 17,308 |
| Liver Hepatocellular Carcinoma (LIHC) | 423 | 20,530 | 438 | 422 | 343 | 19,382 |
| Lung Adenocarcinoma (LUAD) | 576 | 20,530 | 706 | 576 | 486 | 20,068 |
| Lung Squamous Cell Carcinoma (LUSC) | 553 | 20,530 | 626 | 552 | 481 | 20,004 |
| Mesothelioma (MESO) | 87 | 20,530 | 87 | 87 | 81 | 19,463 |
| Ovarian Serous Cystadenocarcinoma (OV) | 308 | 20,530 | 630 | 308 | 290 | 19,404 |
| Pancreatic Adenocarcinoma (PAAD) | 183 | 20,530 | 196 | 183 | 175 | 19,609 |
| Pheochromocytoma and Paraganglioma (PCPG) | 187 | 20,530 | 187 | 187 | 176 | 17,886 |
| Prostate Adenocarcinoma (PRAD) | 550 | 20,530 | 566 | 550 | 483 | 18,456 |
| Rectum Adenocarcinoma (READ) | 105 | 20,530 | 186 | 105 | 81 | 18,316 |
| Sarcoma (SARC) | 265 | 20,530 | 271 | 264 | 258 | 20,083 |
| Skin Cutaneous Melanoma (SKCM) | 474 | 20,530 | 477 | 481 | 409 | 19,638 |
| Stomach Adenocarcinoma (STAD) | 450 | 20,530 | 580 | 450 | 379 | 19,797 |
| Thyroid Carcinoma (THCA) | 572 | 20,530 | 580 | 570 | 497 | 18,027 |
| Thymoma (THYM) | 122 | 20,530 | 126 | 122 | 117 | 18,470 |
| Uterine Corpus Endometrial Carcinoma (UCEC) | 201 | 20,530 | 596 | 201 | 172 | 19,918 |
| Uterine Carcinosarcoma (UCS) | 57 | 20,530 | 57 | 50 | 50 | 19,059 |
| Uveal Melanoma (UVM) | 80 | 20,530 | 80 | 80 | 76 | 17,239 |
Summary information of 32 types of cancer in TCGA.
| Cancer | Age | Female/Male | Median survival time | Stage or grade (1/2/3/4/5) | ||
| All | Event | Censor | ||||
| ACC | 46.6 ± 15.8 | 48/29 | 38.5 | 18.5 | 45.7 | 9/37/16/15 |
| BLCA | 68.1 ± 10.6 | 105/295 | 17.6 | 13.6 | 20.9 | 2/129/137/132 |
| BRCA | 58.5 ± 13.2 | 1060/0 | 27.2 | 37.9 | 25.0 | 178/605/245/19/13 |
| CESC | 48.0 ± 13.6 | 287/287 | 21.1 | 20.8 | 22.8 | 157/67/43/20 |
| CHOL | 63.0 ± 12.9 | 20/16 | 21.2 | 16.4 | 31.0 | 19/9/1/7 |
| COAD | 65.2 ± 13.3 | 121/149 | 21.6 | 14.5 | 22.0 | 45/109/78/38 |
| DLBC | 55.1 ± 14.7 | 22/19 | 31.1 | 19.6 | 31.8 | 8/17/5/11 |
| ESCA | 62.4 ± 11.9 | 24/135 | 13.2 | 12.9 | 13.4 | 18/78/55/8 |
| GBM | 59.7 ± 13.5 | 48/95 | 11.3 | 12.6 | 7.9 | NA |
| HNSC | 60.9 ± 12.1 | 120/320 | 21.4 | 14.2 | 27.4 | 26/70/81/263 |
| KICH | 51.2 ± 14.1 | 27/38 | 73.9 | 28.1 | 89.2 | 20/25/14/6 |
| KIRC | 60.7 ± 12.1 | 186/340 | 39.7 | 27.0 | 48.2 | 265/56/122/83 |
| KIRP | 61.6 ± 12.0 | 67/189 | 24.1 | 20.5 | 24.9 | 170/20/51/15 |
| LAML | 55.8 ± 16.3 | 74/87 | 11.0 | 9.0 | 23.0 | NA |
| LGG | 43.0 ± 13.4 | 224/278 | 21.6 | 26.8 | 20.5 | 0/241/261/0 |
| LIHC | 58.7 ± 13.5 | 109/234 | 18.7 | 12.5 | 20.9 | 171/85/82/5 |
| LUAD | 65.4 ± 10.0 | 263/223 | 21.4 | 20.1 | 21.6 | 263/117/80/26 |
| LUSC | 67.2 ± 8.5 | 126/355 | 21.3 | 17.8 | 24.1 | 237/154/83/7 |
| MESO | 62.9 ± 9.9 | 16/65 | 16.4 | 15.0 | 38.4 | 9/15/42/15 |
| OV | 59.3 ± 11.0 | 290/0 | 31.3 | 35.0 | 24.7 | 1/18/233/38 |
| PAAD | 64.6 ± 11.0 | 79/96 | 15.2 | 12.9 | 16.7 | 21/147/3/4 |
| PCPG | 47.3 ± 15.1 | 99/77 | 25.1 | 14.9 | 25.4 | NA |
| PRAD | 61.0 ± 6.8 | 0/483 | 30.3 | 43.7 | 30.1 | NA |
| READ | 63.2 ± 12.1 | 37/44 | 24.6 | 19.7 | 25.1 | 10/26/33/12 |
| SARC | 60.7 ± 14.6 | 140/118 | 31.3 | 22.0 | 35.9 | NA |
| SKCM | 58.8 ± 15.6 | 154/255 | 32.9 | 31.6 | 34.1 | 77/139/170/23 |
| STAD | 65.3 ± 10.6 | 136/243 | 14.5 | 11.5 | 18.6 | 53/121/166/39 |
| THCA | 47.2 ± 15.8 | 363/134 | 31.0 | 33.6 | 31.0 | 282/52/110/53 |
| THYM | 58.1 ± 13.1 | 56/61 | 40.1 | 28.0 | 40.7 | 35/61/15/6 |
| UCEC | 65.6 ± 11.4 | 172/0 | 22.1 | 20.5 | 22.3 | 93/24/45/10 |
| UCS | 69.7 ± 9.4 | 50/0 | 20.0 | 16.5 | 26.9 | 21/4/17/8 |
| UVM | 62.8 ± 13.0 | 35/41 | 25.0 | 19.4 | 26.6 | 0/37/35/4 |
FIGURE 1Comparison of predictive performance of four models in 16 low-censored cancers. Performance is measured by C-index difference with respect to Cox model with only clinical covariates; therefore, a negative value (i.e., values below the horizontal line) indicates worse performance than the Cox model with only clinical covariates, and the predictive performance was assessed across 100 replicates.
FIGURE 2Performance comparison of coxlmm using the original data sets (denoted by coxlmm) and the permuted data sets (denoted by coxlmm-permuted) across the 32 TCGA cancers. The cancers located to the left of the red dotted line are low-censored, and the cancers on the right side are high-censored.
FIGURE 3Estimated PCE and PGE for the 32 TCGA cancer types. The cancers located to the left of the red dotted line are low-censored, and the cancers on the right side are high-censored. PCE represents the proportion of the survival variation explained by the clinical information alone. PGE represents the proportion of the survival variation explained by the transcriptome information alone.
FIGURE 4(A) The predictive accuracy with a different number of genes after sorting by importance for 16 low-censored cancers. (B) The predictive accuracy with a different number of randomly selected genes for 16 low-censored cancers. The thick pink line represents the average C-index of all cancers.