| Literature DB >> 35954157 |
Carolina Peixoto1, Marta Martins2, Luís Costa2,3, Susana Vinga1.
Abstract
Clear cell renal cell carcinoma (ccRCC) is the most common subtype of RCC showing a significant percentage of mortality. One of the priorities of kidney cancer research is to identify RCC-specific biomarkers for early detection and screening of the disease. With the development of high-throughput technology, it is now possible to measure the expression levels of thousands of genes in parallel and assess the molecular profile of individual tumors. Studying the relationship between gene expression and survival outcome has been widely used to find genes associated with cancer survival, providing new information for clinical decision-making. One of the challenges of using transcriptomics data is their high dimensionality which can lead to instability in the selection of gene signatures. Here we identify potential prognostic biomarkers correlated to the survival outcome of ccRCC patients using two network-based regularizers (EN and TCox) applied to Cox models. Some genes always selected by each method were found (COPS7B, DONSON, GTF2E2, HAUS8, PRH2, and ZNF18) with known roles in cancer formation and progression. Afterward, different lists of genes ranked based on distinct metrics (logFC of DEGs or β coefficients of regression) were analyzed using GSEA to try to find over- or under-represented mechanisms and pathways. Some ontologies were found in common between the gene sets tested, such as nuclear division, microtubule and tubulin binding, and plasma membrane and chromosome regions. Additionally, genes that were more involved in these ontologies and genes selected by the regularizers were used to create a new gene set where we applied the Cox regression model. With this smaller gene set, we were able to significantly split patients into high/low risk groups showing the importance of studying these genes as potential prognostic factors to help clinicians better identify and monitor patients with ccRCC.Entities:
Keywords: Cox regression; biomarker selection; gene ontology; kidney cancer; regularization
Mesh:
Substances:
Year: 2022 PMID: 35954157 PMCID: PMC9367278 DOI: 10.3390/cells11152311
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 7.666
Data distribution regarding each clinical variable of interest: age (mean ± standard deviation), status (dead = 1 and alive = 0), stage (I, II, III and IV), T-stage (I, II, III and IV), N-stage (0 = not yet spread to nearby lymph nodes, 1 = spread to nearby lymph nodes), M-stage (metastasis = 1 and no metastasis = 0), sex (female and male) and race (Caucasian, African American, Asian).
| KIRC ( | ||
|---|---|---|
|
| ||
| Status | 0 | 360 (67%) |
| 1 | 177 (33%) | |
| Stage | I | 269 (50%) |
| II | 57 (11%) | |
| III | 125 (23%) | |
| IV | 84 (16%) | |
| T-stage | I | 275 (51%) |
| II | 69 (13%) | |
| III | 182 (34%) | |
| IV | 11 (2%) | |
| N-stage | 0 | 240 (45%) |
| 1 | 17 (3%) | |
| x | 280 (52%) | |
| M-stage | 0 | 426 (79%) |
| 1 | 79 (15%) | |
| x | 30 (6%) | |
| Sex | Female | 191 (36%) |
| Male | 346 (64%) | |
| Race | Caucasian | 466 (87%) |
| African American | 56 (10%) | |
| Asian | 8 (2%) | |
|
| 7 (1%) | |
Summary of the results obtained from Cox models using the two regularizations, EN and TCox. controls the sparsity of the model (). # genes—number of genes selected. All results are presented as mean values of the 100 runs tested.
|
| 0.3 | 0.2 | 0.1 | 0.05 | ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| EN | 1.11 × 10−18 | 30 | 1.22 × 10−17 | 48 | 0 | 90 | 1.43 × 10−16 | 162 |
|
| 1.02 × 10−15 | 18 | 1.77 × 10−15 | 28 | 8.98 × 10−16 | 51 | 1.86 × 10−15 | 87 |
List of the ten most significant DEGs found between tumor and normal tissues from KIRC patients. LogFC—log fold change; FDR—false discovery rate.
| Genes | LogFC | FDR |
|---|---|---|
|
| −5.34 | 1.96 × 10−235 |
|
| −6.42 | 2.46 × 10−225 |
|
| −7.10 | 1.40 × 10−195 |
|
| −6.08 | 1.79 × 10−183 |
|
| −6.83 | 9.19 × 10−179 |
|
| −5.68 | 1.23 × 10−173 |
|
| −8.18 | 6.98 × 10−173 |
|
| −4.52 | 9.04 × 10−171 |
|
| −6.88 | 1.07 × 10−170 |
|
| −5.41 | 4.48 × 10−169 |
List of the ten most significant DEGs found between early (stages I, II and III) and advanced stages (stage IV) from KIRC patients. LogFC—log fold change; FDR—false discovery rate.
| Genes | LogFC | FDR |
|---|---|---|
|
| −5.70 | 2.27 × 10−84 |
|
| −4.68 | 4.58 × 10−67 |
|
| −5.88 | 8.13 × 10−57 |
|
| −4.41 | 7.53 × 10−50 |
|
| −4.09 | 6.73 × 10−49 |
|
| −4.20 | 6.43 × 10−46 |
|
| −3.78 | 9.30 × 10−42 |
|
| −4.42 | 4.79 × 10−39 |
|
| −2.46 | 3.31 × 10−38 |
|
| −3.35 | 2.34 × 10−33 |
List of the top 20 genes selected by elastic net in at least 50% of the runs when . Arrows represent if genes are upregulated (↑) or downregulated (↓) in tumor tissue or in the advanced stage of the disease and % is the percentage of the runs where a certain gene appears in the solution. – genes that are not differentially expressed in tumor tissue.
| Genes | % | DEGs Tumor Tissue | DEGs in Advanced Stage |
|---|---|---|---|
|
| 100 | ↑ | ↑ |
|
| 100 | ↑ | ↑ |
|
| 100 | ↑ | – |
|
| 99 | ↑ | ↑ |
|
| 99 | ↑ | ↑ |
|
| 99 | ↑ | ↑ |
|
| 99 | – | ↑ |
|
| 99 | ↑ | ↑ |
|
| 99 | ↑ | ↑ |
|
| 99 | ↓ | ↑ |
|
| 98 | ↑ | ↑ |
|
| 98 | ↑ | ↓ |
|
| 97 | ↑ | ↑ |
|
| 96 | ↑ | ↓ |
|
| 96 | – | ↑ |
|
| 95 | – | ↓ |
|
| 95 | ↑ | ↑ |
|
| 95 | ↑ | ↑ |
|
| 95 | ↑ | ↑ |
|
| 94 | ↑ | ↑ |
List of the 20 most selected genes by TCox when . Arrows represent if genes are upregulated (↑) or downregulated (↓) in tumor tissue or the advanced stage of the disease and % is the percentage of the runs where a certain gene appears in the solution. – genes that are not differentially expressed in tumor tissue.
| Genes | % | DEGs Tumor Tissue | DEGs in Advanced Stage |
|---|---|---|---|
|
| 100 | ↑ | ↑ |
|
| 100 | ↑ | ↑ |
|
| 100 | ↑ | ↑ |
|
| 100 | – | ↑ |
|
| 100 | ↑ | ↓ |
|
| 99 | ↑ | ↑ |
|
| 99 | ↑ | ↑ |
|
| 99 | ↑ | ↑ |
|
| 99 | ↑ | – |
|
| 99 | ↑ | ↑ |
|
| 99 | ↑ | – |
|
| 99 | ↑ | ↑ |
|
| 96 | ↑ | ↑ |
|
| 96 | ↓ | ↑ |
|
| 96 | ↑ | ↑ |
|
| 96 | ↑ | ↓ |
|
| 96 | ↑ | ↑ |
|
| 96 | ↓ | ↓ |
|
| 94 | ↑ | ↑ |
|
| 94 | – | – |
Figure 1Gene ontology enrichment analysis regarding biological processes terms for a list of DEGs ranked by the log fold change between tumor and normal tissues. The left panel shows a dot chart with the most significant BP terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 2Gene ontology enrichment analysis regarding molecular function terms for a list of DEGs ranked by the log fold change between tumor and normal tissues. The left panel shows a dot chart with the most significant MF terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 3Gene ontology enrichment analysis regarding cellular components terms for a list of DEGs ranked by the log fold change between tumor and normal tissues. The left panel shows a dot chart with the most significant CC terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 4Gene ontology enrichment analysis regarding biological processes terms for a list of DEGs ranked by the log fold change between early and advanced stages of the disease. The left panel shows a dot chart with the most significant BP terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 5Gene ontology enrichment analysis regarding molecular functions terms for a list of DEGs ranked by the log fold change between early and advanced stages of the disease. The left panel shows a dot chart with the most significant MF terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 6Gene ontology enrichment analysis regarding cellular components terms for a list of DEGs ranked by the log fold change between early and advanced stages of the disease. The left panel shows a dot chart with the most significant CC terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 7Gene ontology enrichment analysis regarding biological processes terms for a list of genes selected by EN ranked by the coefficients of the regression. The left panel shows a dot chart with the most significant BP terms and on the right a gene-concept network plot of the three most enriched terms depicts the linkages of genes and biological concepts as a network.
Figure 8Gene ontology enrichment analysis regarding molecular function terms for a list of genes selected by EN ranked by the coefficients of the regression. The left panel shows a dot chart with the most significant MF terms and on the right a gene-concept network plot of the three most enriched terms depicts the linkages of genes and biological concepts as a network.
Figure 9Gene ontology enrichment analysis regarding cellular components terms for a list of genes selected by EN ranked by the coefficients of the regression. The left panel shows a dot chart with the most significant CC terms and on the right a gene-concept network plot of the three most enriched terms depicts the linkages of genes and biological concepts as a network.
Figure 10Gene ontology enrichment analysis regarding biological processes terms for a list of genes selected by TCox ranked by the coefficients of the regression. The left panel shows a dot chart with the most significant BP terms. Right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 11Gene ontology enrichment analysis regarding molecular function terms for a list of genes selected by TCox ranked by the coefficients of the regression. The left panel shows a dot chart with the most significant MF terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Figure 12Gene ontology enrichment analysis regarding cellular components terms for a list of genes selected by TCox ranked by the coefficients of the regression. The left panel shows a dot chart with the most significant CC terms. The right panel shows a gene-concept network plot of the three most enriched terms that depicts the linkages of genes and biological concepts as a network.
Genes most involved in the top three terms of each ontology (BP, MF and CC) for each gene set studied. GS1—logFC tumor vs. normal; GS2—logFC early vs. advanced stage; GS3—EN coefficients; GS4—TCox coefficients;
| GO | GS1 | GS2 | GS3 | GS4 |
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 13Kaplan–Meier curves obtained when applying a multivariate Cox model to a gene set comprising genes correlated with survival outcome in ccRCC and genes with some enriched ontology associated (). (a) Full dataset (); (b) early stage patients (); (c) metastatic patients ().
List of genes and corresponding coefficients obtained in a multivariate Cox survival model using ridge regression. HR (hazard ratio) gives the effect size of covariates and it is calculated by . , no effect; , reduction in the hazard; , increase in the hazard.
| Genes | Full Dataset | Early Stage | Advanced Stage | |||
|---|---|---|---|---|---|---|
|
| HR |
| HR |
| HR | |
|
| 0.0930 | 1.10 | 0.0686 | 1.07 | 0.0539 | 1.06 |
|
| 0.0840 | 1.09 | 0.0753 | 1.08 | 0.0557 | 1.06 |
|
| 0.0731 | 1.09 | 0.0728 | 1.08 | 0.0552 | 1.06 |
|
| 0.0481 | 1.08 | 0.0316 | 1.03 | 0.0226 | 1.02 |
|
| −0.0977 | 0.91 | −0.0688 | 0.93 | −0.0616 | 0.94 |
|
| 0.0687 | 1.07 | 0.0428 | 1.04 | 0.0271 | 1.03 |
|
| 0.0313 | 1.03 | 0.0240 | 1.02 | 0.0228 | 1.02 |
|
| 0.0647 | 1.07 | 0.0734 | 1.08 | 0.0363 | 1.04 |
|
| 0.0427 | 1.04 | 0.0552 | 1.06 | 0.0251 | 1.03 |
|
| 0.0227 | 1.02 | −0.0151 | 0.99 | 0.0258 | 1.03 |
|
| 0.0544 | 1.06 | 0.0097 | 1.01 | 0.0491 | 1.05 |
|
| 0.0053 | 1.01 | 0.0101 | 1.01 | 0.0210 | 1.02 |
|
| −0.0335 | 0.97 | −0.0213 | 0.98 | −0.0335 | 0.97 |
|
| −0.0166 | 0.98 | −0.0111 | 0.99 | −0.0115 | 0.99 |
|
| −0.0045 | 1.00 | −0.0082 | 0.99 | 0.0276 | 1.03 |
|
| 0.0187 | 1.02 | 0.0080 | 1.01 | 0.0445 | 1.05 |
|
| 0.0121 | 1.01 | 0.0101 | 1.01 | 0.0315 | 1.03 |
|
| −0.0029 | 1.00 | −0.0045 | 1.00 | 0.0415 | 1.04 |
|
| 0.0208 | 1.02 | 0.0063 | 1.01 | 0.0278 | 1.03 |
|
| 0.0135 | 1.01 | −0.0173 | 0.98 | 0.0381 | 1.04 |
|
| −0.0058 | 0.99 | −0.0015 | 1.00 | 0.0090 | 1.01 |
|
| 0.0213 | 1.02 | 0.0046 | 1.00 | 0.0302 | 1.03 |
|
| 0.0356 | 1.04 | 0.0392 | 1.04 | 0.0334 | 1.03 |
|
| 0.0345 | 1.04 | 0.0263 | 1.03 | 0.0346 | 1.04 |
List of genes previously selected by both EN and TCox regularizers and corresponding coefficients obtained when we applied a multivariate Cox survival model with ridge penalization. HR (hazard ratio) gives the effect size of covariates and it is calculated by . , no effect; , reduction in the hazard; , increase in the hazard.
| Genes | Full Dataset | Early Stage | Advanced Stage | |||
|---|---|---|---|---|---|---|
|
| HR |
| HR |
| HR | |
|
| 0.1946 | 1.21 | 0.1920 | 1.21 | 0.2365 | 1.27 |
|
| 0.1728 | 1.19 | 0.1857 | 1.20 | 0.2226 | 1.25 |
|
| 0.1371 | 1.15 | 0.0857 | 1.09 | 0.0421 | 1.04 |
|
| −0.2417 | 0.79 | −0.1781 | 0.84 | −0.2341 | 0.79 |