| Literature DB >> 33182598 |
Carolina Peixoto1, Marta B Lopes2,3, Marta Martins4, Luís Costa4,5, Susana Vinga1.
Abstract
Colorectal cancer (CRC) is one of the leading causes of mortality and morbidity in the world. Being a heterogeneous disease, cancer therapy and prognosis represent a significant challenge to medical care. The molecular information improves the accuracy with which patients are classified and treated since similar pathologies may show different clinical outcomes and other responses to treatment. However, the high dimensionality of gene expression data makes the selection of novel genes a problematic task. We propose TCox, a novel penalization function for Cox models, which promotes the selection of genes that have distinct correlation patterns in normal vs. tumor tissues. We compare TCox to other regularized survival models, Elastic Net, HubCox, and OrphanCox. Gene expression and clinical data of CRC and normal (TCGA) patients are used for model evaluation. Each model is tested 100 times. Within a specific run, eighteen of the features selected by TCox are also selected by the other survival regression models tested, therefore undoubtedly being crucial players in the survival of colorectal cancer patients. Moreover, the TCox model exclusively selects genes able to categorize patients into significant risk groups. Our work demonstrates the ability of the proposed weighted regularizer TCox to disclose novel molecular drivers in CRC survival by accounting for correlation-based network information from both tumor and normal tissue. The results presented support the relevance of network information for biomarker identification in high-dimensional gene expression data and foster new directions for the development of network-based feature selection methods in precision oncology.Entities:
Keywords: Cox regression; RNA-seq data; TCGA data; regularized optimization; survival analysis
Year: 2020 PMID: 33182598 PMCID: PMC7696515 DOI: 10.3390/biomedicines8110488
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Figure 1Methodological procedure for the identification of gene signatures in colorectal cancer data.
Figure 2p-values obtained in the separation of high- and low-risk survival curves based on the genes selected by TCox models generated with transformations of w using colorectal RNA-seq data, tested over different values.
Results from 100 runs of training and test sets in all survival models analyzed using . S—statistically significant runs (p-value ); NS—non-statistically significant runs; #—number of runs.
| Models | TCox | EN | HubCox | OrphanCox | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| NA | S | NS | NA | S | NS | NA | S | NS | NA | S | NS |
|
| 33 | 7 | 60 | 31 | 4 | 65 | 43 | 3 | 54 | 32 | 2 | 66 |
|
| – | 0.0164 | 0.4985 | – | 0.0251 | 0.5354 | – | 0.0137 | 0.5168 | – | 0.0160 | 0.4997 |
Summary of TCox, EN, HubCox, and OrphanCox model results showing the number of selected variables and the p-values obtained for the training and test sets.
| Survival Models |
| Selected Variables | ||
|---|---|---|---|---|
| Train | Test | |||
| 0.3 | 10 | 0.002401583 | 0.0757 | |
| 0.2 | 11 | 0.000588251 | 0.0665 | |
| 0.1 | 53 | 2.66444 × 10−9 | 0.0194 | |
|
| 0.3 | 18 | 8.38703 × 10−7 | 0.0088 |
| 0.2 | 47 | 2.47428 × 10−8 | 0.0717 | |
| 0.1 | 88 | 5.28787 × 10−9 | 0.0492 | |
|
| 0.3 | 26 | 1.78804 × 10−8 | 0.0138 |
| 0.2 | 47 | 1.18224 × 10−8 | 0.0129 | |
| 0.1 | 90 | 2.74104 × 10−9 | 0.0418 | |
|
| 0.3 | 8 | 2.48965 × 10−5 | 0.1519 |
| 0.2 | 44 | 1.20494 × 10−7 | 0.0327 | |
| 0.1 | 67 | 6.80248 × 10−9 | 0.0632 | |
Figure 3Kaplan–Meier curves obtained from the (a) training and (b) test sets, based on the variables selected by the TCox model with .
List of genes selected for at least 50% or 75% of the runs by all methods tested.
| Runs | TCox | EN | HubCox | OrphanCox | |
|---|---|---|---|---|---|
|
|
| 3 | 2 | 2 | 1 |
|
|
|
|
|
| |
|
|
| 16 | 16 | 16 | 1 |
|
|
|
|
|
| |
Figure 4p-values obtained for survival models applied to the test sets, using different -values.
Figure 5Venn diagram representing the number of genes selected by EN (yellow), HubCox (green), OrphanCox (red), and TCox (blue) using .
Genes selected by all models evaluated and selected exclusively by EN, HubCox, OrphanCox, and TCox. Arrows indicate if genes were found to be up- (↑) or down-regulated (↓) in tumoral tissue (differential gene expression analysis was performed using the edgeR R package).
|
|
|
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
| |
|
|
Figure 6Genes selected by all models tested associated with the hallmarks of cancer, given by the CHAT. Value corresponds to the number of hits found in the literature, where light and dark blue correspond to a low and high number of hits, respectively.
Figure 7Genes selected by the HubCox and EN models associated with the hallmarks of cancer, given by the CHAT. (a) HubCox; (b) EN. The value corresponds to the number of hits found in the literature, where light and dark blue correspond to a low and high number of hits, respectively.
Figure 8Genes selected by the TCox method associated with the hallmarks of cancer, given by the CHAT. The value corresponds to the number of hits found in the literature, where light and dark blue correspond to a low and high number of hits, respectively.
Figure 9Survival curves obtained for the genes exclusively selected by the TCox method, when analyzed individually.