| Literature DB >> 35070171 |
Zihang Zeng1, Yanping Gao1, Jiali Li1, Gong Zhang1, Shaoxing Sun1, Qiuji Wu1, Yan Gong2,3, Conghua Xie1,4,5.
Abstract
BACKGROUND: Cox proportional hazard regression (CPH) model relies on the proportional hazard (PH) assumption: the hazard of variables is independent of time. CPH has been widely used to identify prognostic markers of the transcriptome. However, the comprehensive investigation on PH assumption in transcriptomic data has lacked.Entities:
Keywords: ACC, Adrenocortical carcinoma; AIC, Akaike information criterion; BLCA, Bladder Urothelial Carcinoma; BRCA, Breast invasive carcinoma; CESC, Cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, Cholangiocarcinoma; COAD, Colon adenocarcinoma; CON, Concordance regression; CPH, Cox proportional hazard regression; Cox regression; DLBC, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma; ESCA, Esophageal carcinoma; GBM, Glioblastoma multiforme; GEO, Gene Expression Omnibus; GO, Gene Ontology; HNSC, Head and Neck squamous cell carcinoma; KICH, Kidney Chromophobe; KIRC, Kidney renal clear cell carcinoma; KIRP, Kidney renal papillary cell carcinoma; LGG, Brain Lower Grade Glioma; LIHC, Liver hepatocellular carcinoma; LUAD, Lung adenocarcinoma; LUSC, Lung squamous cell carcinoma; MESO, Mesothelioma; OS, overall survival; OV, Ovarian serous cystadenocarcinoma; PAAD, Pancreatic adenocarcinoma; PCPG, Pheochromocytoma and Paraganglioma; PH, proportional hazard; PRAD, Prostate adenocarcinoma; Pan-cancer; Proportional hazard assumption; READ, Rectum adenocarcinoma; SARC, Sarcoma; SKCM, Skin Cutaneous Melanoma; STAD, Stomach adenocarcinoma; TCGA; TCGA, The Cancer Genome Atlas; TCGA, tumor abbreviations; TGCT, Testicular Germ Cell Tumors; THCA, Thyroid carcinoma; THYM, Thymoma; Transcriptome; UCEC, Uterine Corpus Endometrial Carcinoma; UCS, Uterine Carcinosarcoma; UVM, Uveal Melanoma
Year: 2022 PMID: 35070171 PMCID: PMC8762368 DOI: 10.1016/j.csbj.2022.01.004
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Study design. PH, proportional hazards; CPH, Cox proportional hazard regression; TCGA, The Cancer Genome Atlas; AIC, Akaike information criterion; GO, Gene Ontology.
Fig. 2The landscape of violations of the PH assumption for transcriptome genes in TCGA pan-cancer cohorts. (A) Skewness and kurtosis of distribution of P values of Schoenfeld residual test; (B) Plot of tumor numbers and numbers of non-proportional hazard genes; (C) QQ plot of Schoenfeld residuals test P value and expected P value. The red dotted line represents y = x; (D) Multivariable linear regression fitted the area under the curve of the QQ plot; (E) Multivariable linear regression fitted the proportion of p value better than expected value (log10(proportion*30000)); (F-H) Scaled Schoenfeld residual plot of RPL7A gene. QQ, Quantile-Quantile; PH, proportional hazard; AUC, area under the curve. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Topological characteristics of proportional and non-proportional genes in TCGA pan-cancer cohorts. (A) Betweenness of proportional and non-proportional genes; (B) Closeness of proportional and non-proportional genes; (C) Degree of proportional and non-proportional genes; (D & E) Gene interaction networks in GBM and LGG; (F) Proportion of gene CPH model that assumption violated PH in the 5 important biological processes. CPH, Cox proportional hazard regression; PH, proportional hazard; TCGA, The Cancer Genome Atlas.
Proportions of hub and non-hub genes with non-proportional hazards in CPH model.
| TCGA ID | PH-Nhub | PH-hub | NPH-Nhub | NPH-hub | hub (NPH%) | Nhub (NPH%) | Chi-Square P |
|---|---|---|---|---|---|---|---|
| ACC | 9161 | 5787 | 534 | 195 | 3.26% | <0.0001 | |
| BLCA | 7787 | 5137 | 1302 | 649 | 11.22% | <0.0001 | |
| BRCA | 7294 | 4889 | 1160 | 714 | 12.74% | 13.72% | 0.1 |
| CESC | 8535 | 5325 | 699 | 485 | 8.35% | 7.57% | 0.09 |
| CHOL | 9655 | 5845 | 575 | 344 | 5.56% | 5.62% | 0.8936 |
| COAD | 8686 | 5538 | 532 | 346 | 5.88% | 5.77% | 0.8075 |
| DLBC | 9235 | 5730 | 547 | 294 | 4.88% | 5.59% | 0.05758 |
| ESCA | 9423 | 5810 | 533 | 270 | 4.44% | 0.01128 | |
| GBM | 8762 | 5074 | 1059 | 946 | 10.78% | <0.0001 | |
| HNSC | 8211 | 5443 | 617 | 270 | 4.73% | <0.0001 | |
| KICH | 8630 | 5264 | 926 | 663 | 9.69% | 0.003137 | |
| KIRC | 8241 | 5114 | 547 | 601 | 6.22% | <0.0001 | |
| KIRP | 8410 | 5346 | 679 | 433 | 7.49% | 7.47% | 0.9857 |
| LGG | 6209 | 3705 | 2763 | 2029 | 30.80% | <0.0001 | |
| LIHC | 6622 | 3752 | 1955 | 1928 | 22.79% | <0.0001 | |
| LUAD | 8410 | 5326 | 859 | 526 | 8.99% | 9.27% | 0.582 |
| LUSC | 8183 | 5197 | 1074 | 631 | 10.83% | 11.60% | 0.1506 |
| MESO | 9134 | 5705 | 753 | 351 | 5.80% | <0.0001 | |
| OV | 8163 | 5239 | 1038 | 581 | 9.98% | 0.01339 | |
| PAAD | 8732 | 5416 | 1054 | 652 | 10.74% | 10.77% | 0.9807 |
| PCPG | 8787 | 5635 | 566 | 274 | 4.64% | 0.000219 | |
| PRAD | 8528 | 5506 | 257 | 182 | 3.20% | 2.93% | 0.3734 |
| READ | 7500 | 4505 | 2203 | 1529 | 22.70% | 0.0001699 | |
| SARC | 8471 | 5429 | 949 | 487 | 8.23% | 0.0001544 | |
| SKCM | 9010 | 5691 | 703 | 280 | 4.69% | <0.0001 | |
| STAD | 8220 | 5326 | 1000 | 578 | 9.79% | 0.04084 | |
| TGCT | 10,221 | 6113 | 136 | 72 | 1.16% | 1.31% | 0.4472 |
| THCA | 8390 | 5475 | 191 | 123 | 2.20% | 2.23% | 0.9562 |
| THYM | 9489 | 5840 | 294 | 145 | 2.42% | 0.03506 | |
| UCEC | 9327 | 5712 | 528 | 310 | 5.15% | 5.36% | 0.591 |
| UCS | 10,226 | 6088 | 307 | 162 | 2.59% | 2.91% | 0.2389 |
| UVM | 8876 | 5651 | 345 | 186 | 3.19% | 3.74% | 0.07955 |
Note: PH, proportional hazards; NPH, non-proportional hazards; Nhub, non-hub; CPH, COX proportional hazards.
Fig. 4The non-proportional hazards of genes varied widely across cohorts of the same tumors. (A) The proportion of non-proportional in 5 NSCLC datasets; (B) Overlapping patterns of non-proportional genes; (C) Kappa coefficient matrix of panel B; (D) Design of exploring potential factors associated with non-proportionality; (E) Plot of beta value of univariate Cox regression in patient A and patient B groups (see methods); (F) The 4 patient clusters had differentiated patterns; (G) Boxplot of patient clusters and cellular scores of microenvironment; (H) Differently expressed genes for each patient cluster. NSCLC, non-small cell lung cancer; LUAD,
Fig. 5CPH with the time-dependent variables allowed for a better fit of the prognosis. (A) The Log-likelihood and AIC of CPH with different time functions in TCGA pan-cancer cohorts; (B) Proportions of max Log-likelihood value for gene CPH model violated PH assumption in CPH with different time functions. CPH, Cox proportional hazard regression; AIC, Akaike information criterion.
Proportions of maxima Log-likelihood and minima AIC for non-proportional genes in CPH with different time functions.
| TCGA ID | Patient number | Proportions of gene CPH model violated PH (%) | Maxima Log likelihood (Proportions %) | Minima AIC (Proportions %) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Non-t | t | log(t) | t^2 | sqrt(t) | exp^t | Non-t | t | log(t) | t^2 | sqrt(t) | exp^t | |||
| ACC | 77 | 5.34% | 0.00% | 15.12% | 13.57% | 14.09% | 20.87% | 0.00% | 15.12% | 13.57% | 14.09% | 20.87% | ||
| BLCA | 405 | 12.44% | 0.00% | 18.18% | 2.35% | 36.48% | 2.12% | 0.19% | 18.18% | 2.35% | 36.40% | 2.12% | ||
| BRCA | 1077 | 13.33% | 0.00% | 12.06% | 6.70% | 26.90% | 1.06% | 0.00% | 12.06% | 6.70% | 26.90% | 1.06% | ||
| CESC | 291 | 7.91% | 0.00% | 16.27% | 5.39% | 36.28% | 4.76% | 0.06% | 16.27% | 5.39% | 36.22% | 4.76% | ||
| CHOL | 36 | 5.64% | 0.00% | 22.61% | 15.56% | 19.21% | 18.30% | 0.00% | 22.55% | 15.44% | 19.21% | 18.48% | ||
| COAD | 282 | 5.87% | 0.00% | 20.91% | 23.61% | 23.46% | 4.42% | 0.00% | 20.91% | 23.61% | 23.46% | 4.42% | ||
| DLBC | 46 | 5.59% | 0.00% | 20.55% | 20.82% | 18.54% | 1.54% | 0.13% | 20.28% | 21.36% | 18.33% | 1.54% | ||
| ESCA | 181 | 4.82% | 0.00% | 10.55% | 16.83% | 20.22% | 15.28% | 0.00% | 10.55% | 16.83% | 20.22% | 15.28% | ||
| GBM | 152 | 9.67% | 0.00% | 15.33% | 25.33% | 3.51% | 10.32% | 0.04% | 15.29% | 25.33% | 3.47% | 10.60% | ||
| HNSC | 517 | 7.48% | 0.00% | 30.29% | 15.55% | 3.55% | 0.75% | 0.07% | 30.15% | 15.55% | 3.55% | 0.89% | ||
| KICH | 65 | 10.92% | 0.00% | 4.46% | 3.16% | 4.03% | 8.06% | 0.00% | 4.43% | 3.16% | 4.07% | 8.06% | ||
| KIRC | 528 | 7.61% | 0.00% | 16.00% | 26.13% | 14.15% | 6.88% | 0.00% | 16.00% | 26.13% | 14.15% | 6.88% | ||
| KIRP | 285 | 8.68% | 0.00% | 16.50% | 12.71% | 18.71% | 6.25% | 0.00% | 16.50% | 12.71% | 18.71% | 6.25% | ||
| LGG | 504 | 29.31% | 0.00% | 5.43% | 16.60% | 27.45% | 6.67% | 0.00% | 5.43% | 16.60% | 27.45% | 6.67% | ||
| LIHC | 363 | 23.01% | 0.00% | 40.65% | 8.29% | 7.54% | 1.83% | 0.00% | 40.65% | 8.29% | 7.54% | 1.83% | ||
| LUAD | 500 | 9.09% | 0.00% | 10.04% | 20.79% | 29.14% | 3.25% | 0.15% | 10.04% | 20.44% | 29.04% | 3.75% | ||
| LUSC | 491 | 11.89% | 0.00% | 34.87% | 10.56% | 7.08% | 2.07% | 0.00% | 34.87% | 10.56% | 7.08% | 2.11% | ||
| MESO | 85 | 7.79% | 0.00% | 27.92% | 16.19% | 20.47% | 6.68% | 0.05% | 27.92% | 16.05% | 20.47% | 6.82% | ||
| OV | 418 | 11.31% | 0.00% | 6.33% | 17.55% | 22.24% | 4.34% | 0.16% | 6.33% | 17.55% | 22.20% | 4.34% | ||
| PAAD | 177 | 10.75% | 0.00% | 22.21% | 15.03% | 24.14% | 8.52% | 0.15% | 22.21% | 15.03% | 24.03% | 8.48% | ||
| PCPG | 177 | 5.56% | 0.00% | 10.65% | 27.22% | 9.89% | 2.59% | 0.15% | 10.27% | 27.83% | 9.73% | 2.81% | ||
| PRAD | 495 | 3.66% | 0.00% | 6.29% | 5.89% | 10.31% | 22.76% | 5.49% | 5.76% | 5.22% | 9.91% | 20.35% | ||
| READ | 91 | 19.22% | 0.00% | 26.04% | 23.51% | 10.34% | 12.76% | 0.00% | 26.04% | 23.51% | 10.34% | 12.76% | ||
| SARC | 258 | 9.16% | 0.00% | 16.87% | 14.78% | 20.16% | 12.19% | 0.00% | 16.87% | 14.78% | 20.16% | 12.19% | ||
| SKCM | 101 | 5.54% | 0.00% | 7.46% | 5.97% | 5.40% | 30.47% | 0.00% | 7.46% | 5.97% | 5.40% | 30.47% | ||
| STAD | 388 | 9.32% | 0.00% | 22.51% | 29.49% | 0.90% | 3.18% | 0.05% | 22.51% | 29.49% | 0.90% | 3.18% | ||
| TGCT | 132 | 1.41% | 0.00% | 17.26% | 28.17% | 9.39% | 1.27% | 7.61% | 14.21% | 21.83% | 6.35% | 13.71% | ||
| THCA | 503 | 2.35% | 0.00% | 17.46% | 17.67% | 24.35% | 11.64% | 2.16% | 16.59% | 17.46% | 23.92% | 11.21% | ||
| THYM | 118 | 3.14% | 0.00% | 5.05% | 12.76% | 19.01% | 3.25% | 0.00% | 5.05% | 12.76% | 18.77% | 3.25% | ||
| UCEC | 178 | 6.15% | 0.00% | 19.62% | 14.25% | 22.62% | 13.23% | 0.64% | 19.55% | 13.74% | 21.98% | 13.93% | ||
| UCS | 56 | 3.24% | 0.00% | 23.82% | 12.07% | 22.04% | 8.18% | 0.10% | 23.82% | 11.86% | 21.93% | 8.39% | ||
| UVM | 79 | 4.26% | 0.00% | 19.62% | 22.40% | 19.43% | 12.39% | 0.00% | 19.62% | 22.40% | 19.43% | 12.39% | ||
Fig. 6CPH with the time-dependent variables changed the significance of logHR of the genes. (A) Changes of logHR significance in non-proportional hazard genes with different time-dependent functions in BRCA; (B) Scaled Schoenfeld residual plot of BCL2 in BRCA; (C) CPH of BCL2 in BRCA; (D) Scaled Schoenfeld residual plot of VEGFA in BRCA; (E) CPH of VEGFA in BRCA. CPH, Cox proportional hazard regression.