| Literature DB >> 35508961 |
Po-Wen Wang1,2, Yi-Hsun Su1,3, Po-Hao Chou1,2, Ming-Yueh Huang4, Ting-Wen Chen5,6,7.
Abstract
BACKGROUND: Pan-cancer studies have disclosed many commonalities and differences in mutations, copy number variations, and gene expression alterations among cancers. Some of these features are significantly associated with clinical outcomes, and many prognosis-predictive biomarkers or biosignatures have been proposed for specific cancer types. Here, we systematically explored the biological functions and the distribution of survival-related genes (SRGs) across cancers.Entities:
Keywords: Biomarker; Cox proportional hazards model; Log-rank test; Pan-cancer; Survival; Univariate analysis
Mesh:
Substances:
Year: 2022 PMID: 35508961 PMCID: PMC9066720 DOI: 10.1186/s12864-022-08581-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Summary of 33 cancer types in TCGA
| Cancer Type (Abbreviation) | Sample Number | Event Number | Applicable Genes | |
|---|---|---|---|---|
| Log-Rank Test | Cox Regression | |||
| Adrenocortical carcinoma (ACC) | 79 | 43 | 16,921 | 16,146 |
| Bladder urothelial carcinoma (BLCA) | 407 | 229 | 17,469 | 16,137 |
| Breast invasive carcinoma (BRCA) | 1080 | 204 | 17,658 | 12,023 |
| Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) | 291 | 88 | 17,531 | 16,171 |
| Cholangiocarcinoma (CHOL) | 36 | 22 | 17,510 | 15,992 |
| Colon adenocarcinoma (COAD) | 279 | 105 | 17,454 | 16,698 |
| Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC) | 47 | 16 | 16,897 | 16,439 |
| Esophageal carcinoma (ESCA) | 184 | 113 | 18,118 | 17,285 |
| Glioblastoma multiforme (GBM) | 152 | 131 | 17,655 | 16,469 |
| Head and neck squamous cell carcinoma (HNSC) | 519 | 271 | 17,699 | 16,248 |
| Kidney chromophobe (KICH) | 65 | 12 | 17,175 | 16,421 |
| Kidney renal clear cell carcinoma (KIRC) | 531 | 223 | 17,662 | 16,531 |
| Kidney renal papillary cell carcinoma (KIRP) | 287 | 72 | 17,357 | 13,727 |
| Acute myeloid leukemia (LAML) | 151 | 92 | 16,477 | 14,445 |
| Brain lower grade glioma (LGG) | 511 | 201 | 17,801 | 14,036 |
| Liver hepatocellular carcinoma (LIHC) | 366 | 225 | 16,931 | 14,581 |
| Lung adenocarcinoma (LUAD) | 502 | 258 | 17,764 | 14,936 |
| Lung squamous cell carcinoma (LUSC) | 495 | 252 | 17,989 | 17,170 |
| Mesothelioma (MESO) | 85 | 80 | 17,562 | 16,705 |
| Ovarian serous cystadenocarcinoma (OV) | 302 | 231 | 17,968 | 16,849 |
| Pancreatic adenocarcinoma (PAAD) | 177 | 122 | 18,007 | 12,693 |
| Pheochromocytoma and paraganglioma (PCPG) | 179 | 23 | 17,373 | 14,346 |
| Prostate adenocarcinoma (PRAD) | 497 | 97 | 17,700 | 17,191 |
| Rectum adenocarcinoma (READ) | 94 | 29 | 17,575 | 15,504 |
| Sarcoma (SARC) | 259 | 153 | 17,375 | 15,299 |
| Skin cutaneous melanoma (SKCM) | 102 | 44 | 17,298 | 15,242 |
| Stomach adenocarcinoma (STAD) | 393 | 195 | 17,967 | 17,338 |
| Testicular germ cell tumors (TGCT) | 134 | 36 | 18,471 | 13,477 |
| Thyroid carcinoma (THCA) | 500 | 60 | 17,435 | 16,059 |
| Thymoma (THYM) | 119 | 24 | 17,646 | 17,055 |
| Uterine corpus endometrial carcinoma (UCEC) | 174 | 49 | 17,640 | 17,047 |
| Uterine carcinosarcoma (UCS) | 56 | 41 | 17,986 | 17,606 |
| Uveal melanoma (UVM) | 80 | 34 | 16,620 | 16,088 |
Fig. 1The workflow for data pre-processing, model fitting and functional analysis. The flowchart illustrates the working process of the present paper. RNA-Seq and clinical survival data were retrieved from Broad GDAC firehose. mRNA expression data from Illumina Hiseq were used. The RSEM-derived TPM were log2 transformed and standardized. Genes with median absolute deviation (MAD) greater than zero were fused with clinical survival data. In the model fitting section, the derived data were directly applied to the log-rank test or were examined for the proportional hazards assumption before applying the Cox model. Both models were fitted individually for each gene in each cancer type. The result tables indicate the simplified information generating from the models. Multiple testing corrections were performed before subsequently analysed by pathway enrichment and clustering. Abbreviation: RSEM, RNA-Seq by expectation maximization. RMST, restricted mean survival time. TPM, transcript per million
Fig. 2Pan-Cancer analysis of survival-related genes from the log-rank test. Benjamini & Hochberg adjusted p-values from the log-rank test were log10-transformed. Absolute values were taken for harmful genes and shown in red. Protective genes were shown in blue. Grey color indicates insignificant cases (FDR ≥ 0.05). White color indicates inapplicable cases. Each row represents the log p-values from a specific gene in cancer types. Each column represents a cancer type with the number of significant genes (FDR < 0.05) greater or equal to 100. Genes not significant in any cancer types were not shown here. Rows were clustering by Euclidean distance and complete linkage. Columns were clustered by Pearson distance and complete linkage. The organ system is indicated with different colors. The scale bar at the bottom indicates the range of log p-values
Fig. 3Pan-cancer analysis of survival-related genes from Cox regression. Significant regression coefficients from Cox regression were clustered. Harmful genes were shown in red and protective genes were shown in blue. Grey color indicates insignificant cases (FDR ≥ 0.05). White color indicates inapplicable cases. Each row represents the Cox coefficients from a specific gene in cancer types. Each column represents a cancer type with the number of significant genes (FDR < 0.05) greater or equal to 100. Genes not significant in any cancer types were not shown here. Rows and columns were clustering by Pearson distance and complete linkage. The organ system is indicated with different colors. The scale bar at the bottom indicates the range of Cox coefficients
Fig. 4Pathway enrichment of survival-related genes from the log-rank test. SRGs from the log-rank test were enriched with Gene Ontology. Pathways having FDR < 0.001 are displayed in the heatmap. Significant pathways were manually grouped according to the relationship in Gene Ontology. The names of grouped pathways are shown on the right side and GO IDs that represented the grouped pathways are shown on the left side. The grouped pathways are further categorized by their biological functions as indicated in the bottom-right annotation legend. The gene ratio that indicates the percentage of significant genes (SRGs) enriched in the pathway is presented with different block colors
Fig. 5Pathway enrichment of survival-related genes from the Cox regression. SRGs from the Cox regression were enriched with Gene Ontology. Pathways having FDR < 0.001 are displayed in the heatmap. Significant pathways were manually grouped according to the relationship in Gene Ontology. The names of grouped pathways are shown on the right side and GO IDs that represented the grouped pathways are shown on the left side. The grouped pathways are further categorized by their biological functions as indicated in the bottom-right annotation legend. The gene ratio that indicates the percentage of significant genes (SRGs) enriched in the pathway is presented with different block colors
Comparison between survival-related genes and cancer driver genes
| Cancer Type | # of SRGsa | DriverDBV3 | |||||
|---|---|---|---|---|---|---|---|
| Mutationb | CNV | Methylation | |||||
| KIRC | 7770 | 46.9% | (340/725) ** | 0% | (0/3) | 50% | (6/12) |
| LGG | 6691 | 32.5% | (570/1752) | 82.2% | (37/45) *** | – | |
| ACC | 5243 | 19.7% | (74/375) | 32.3% | (40/124) | – | |
| UVM | 3765 | 25% | (9/36) | 62% | (75/121) *** | – | |
| LIHC | 2359 | 9.5% | (106/1113) | 9.6% | (8/83) | 6.6% | (12/181) |
| PRAD | 1538 | 5.1% | (58/1141) | 23.5% | (4/17) | 4.4% | (3/68) |
| MESO | 586 | 2.5% | (1/40) | 0% | (0/1) | – | |
| PAAD | 443 | 1.6% | (16/971) | 0% | (0/2) | 16% | (4/25)** |
| KIRP | 238 | 1.9% | (12/631) | 1.1% | (1/87) | 2.2% | (5/224) |
| BLCA | 39 | 0.2% | (6/2404) | 0% | (0/104) | 0% | (0/443) |
| CESC | 35 | 0.2% | (4/1815) | 0% | (0/52) | – | |
| LAML | 29 | 0% | (0/413) | 0% | (0/3) | – | |
| HNSC | 18 | 0% | (0/1914) | 1% | (1/103) | 0% | (0/89) |
| STAD | 10 | 0.1% | (4/3695) | 0% | (0/86) | – | |
| LUAD | 8 | 0% | (1/2903) | 0% | (0/37) | 0% | (0/18) |
| SKCM | 1 | 0% | (0/4559) | 0% | (0/18) | – | |
| KIRC | 7091 | 38.3% | (278/725) | 0% | (0/3) | 50% | (6/12) |
| ACC | 6467 | 25.9% | (97/375) | 43.5% | (54/124) | – | |
| UVM | 5600 | 33.3% | (12/36) | 84.3% | (102/121) *** | – | |
| LGG | 5418 | 27.2% | (476/1752) | 42.2% | (19/45) *** | – | |
| PAAD | 4287 | 21.2% | (206/971) | 0% | (0/2) | 36% | (9/25)* |
| LIHC | 2384 | 10% | (111/1113) | 8.4% | (7/83) | 9.9% | (18/181) |
| PRAD | 2068 | 8.1% | (92/1141) | 35.3% | (6/17) * | 5.9% | (4/68) |
| MESO | 728 | 2.5% | (1/40) | 0% | (0/1) | – | |
| KIRP | 592 | 2.4% | (15/631) | 2.3% | (2/87) | 1.8% | (4/224) |
| BLCA | 442 | 2.7% | (64/2404) | 1% | (1/104) | 1.6% | (7/443) |
| KICH | 407 | 3.7% | (2/54) | – | – | ||
| CESC | 230 | 1.4% | (25/1815) | 0% | (0/52) | – | |
| HNSC | 204 | 0.9% | (17/1914) | 1.9% | (2/103) | 0% | (0/89) |
| LAML | 158 | 0.7% | (3/413) | 0% | (0/3) | – | |
| LUAD | 71 | 0.4% | (12/2903) | 5.4% | (2/37) ** | 0% | (0/18) |
| PCPG | 64 | 2% | (1/49) | 0% | (0/5) | – | |
| BRCA | 24 | 0.1% | (2/2162) | 0% | (0/220) | 0% | (0/36) |
| UCEC | 5 | 0% | (0/6669) | 0% | (0/76) | – | |
| STAD | 3 | 0% | (1/3695) | 0% | (0/86) | – | |
| SARC | 2 | 0% | (0/543) | 0% | (0/104) | – | |
| THCA | 2 | 0% | (0/192) | 0% | (0/3) | 0% | (0/201) |
Note: Fisher’s exact test of SRGs and cancer driver genes; *p < 0.05, **p < 0.01, ***p < 0.001, one-tailed. The first number in the parentheses indicates the count of overlapping genes between SRGs and cancer driver genes, and the second number indicates total driver genes that are also applicable genes in specified cancer
- Not available; no driver genes were described in those cancer types
aSRGs for both models are defined as FDR < 0.05
bMutation-based driver genes were merged based on 14 tools summarized by DriverDBV3
| SRGs | Non-SRGs | ||
| Driver genes | a | b | R1 |
| Not driver genes | c | d | R2 |
| C1 | C2 | Applicable genes |