| Literature DB >> 26202601 |
Emmanuel Martinez-Ledesma1, Roeland G W Verhaak2, Victor Treviño3.
Abstract
Cancer types are commonly classified by histopathology and more recently through molecular characteristics such as gene expression, mutations, copy number variations, and epigenetic alterations. These molecular characterizations have led to the proposal of prognostic biomarkers for many cancer types. Nevertheless, most of these biomarkers have been proposed for a specific cancer type or even specific subtypes. Although more challenging, it is useful to identify biomarkers that can be applied for multiple types of cancer. Here, we have used a network-based exploration approach to identify a multi-cancer gene expression biomarker highly connected by ESR1, PRKACA, LRP1, JUN and SMAD2 that can be predictive of clinical outcome in 12 types of cancer from The Cancer Genome Atlas (TCGA) repository. The gene signature of this biomarker is highly supported by cancer literature, biological terms, and prognostic power in other cancer types. Additionally, the signature does not seem to be highly associated with specific mutations or copy number alterations. Comparisons with cancer-type specific and other multi-cancer biomarkers in TCGA and other datasets showed that the performance of the proposed multi-cancer biomarker is superior, making the proposed approach and multi-cancer biomarker potentially useful in research and clinical settings.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26202601 PMCID: PMC5378879 DOI: 10.1038/srep11966
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Cancer datasets used for our study.
| BLCA | Bladder Urothelial Carcinoma | 54/35 | RNA-Seq |
| BRCA | Breast Invasive Carcinoma | 502/437 | Agilent |
| COADREAD | Colon and Rectum Adenocarcinoma | 151/134 | Agilent |
| GBM | Glioblastoma | 538/116 | Affymetrix |
| HNSC | Head and Neck Squamous Cell | 283/164 | RNA-Seq |
| KIRC | Kidney Renal Clear Cell | 468/313 | RNA-Seq |
| LAML | Acute Myeloid Leukemia | 168/60 | RNA-Seq |
| LUAD | Lung Adenocarcinoma | 255/175 | RNA-Seq |
| LUSC | Lung Squamous Cell | 205/120 | RNA-Seq |
| OV | Ovarian Serous Cystadenocarcinoma | 578/276 | Affymetrix |
| UCEC | Uterine Corpus Endometrial Carcinoma | 333/305 | RNA-Seq |
| MULTI | All cancers above | 3535/2135 |
*Colon and rectal adenocarcinoma datasets (COADREAD) were fused as in the TCGA pan-cancer analyses.
Figure 1Schematic representation of the network clinical association algorithm (NCA).
Starting from a single seed gene (black), the first cycle generates modules that include the seed gene and each of the connected genes (blue). The 6 modules of 2 genes are then evaluated by their goodness of fit in a Cox survival model. Only those grown modules that improve (filled blue circles) the evaluation are considered for the next grow cycle. Only a proportion of the best improved modules are further explored in the next cycle (represented by a percentage of the distribution of all modules, shown in green, evaluated in the 4th cycle). This procedure continues until no improvement is observed. The NCA algorithm was run for each cancer type and for all cancer datasets (multi-NCA).
Networks modules obtained for each cancer type using the NCA algorithm.
| BLCA | 10,303 | 10 | 41 | |
| BRCA | 485 | 9 | 42 | JAK2, NFKBIA, |
| COADREAD | 252 | 13 | 36 | |
| GBM | 2,142 | 9 | 42 | EFEMP2, MAPK3, TP53, TOP1, CCDC6, SREBF1, GJA1 |
| HNSC | 661 | 9 | 41 | DUSP16, KRT8, RAF1, MED1, PPARG, YWHAB, FABP1 |
| KIRC | 2,841 | 4 | 41 | AR, HGS, RUNX1, |
| LAML | 584 | 8 | 42 | GUCY2C, PTPRA, SRC, STAT5B, WAS, KCNQ5, CALM1 |
| LUAD | 808 | 9 | 42 | DOK1, FUT4, INSR, ITGB2, SHC1, PTPRC, KHDRBS1 |
| LUSC | 84 | 14 | 37 | |
| OV | 421 | 9 | 42 | |
| UCEC | 1,570 | 10 | 41 | CREBBP, GTF2B, CSNK2A1, CTNNB1, HOXD4, HIPK1, PTEN |
| MULTI | 2 | 44 | 41 | |
*The complete lists of genes and samples used are shown in Supplementary Table 1.
**Highest connected genes. Genes in boldface type are repeated more than once in this list.
Figure 2Comparison of biomarkers generated by the network clinical association (NCA) algorithm.
Panel A shows the number of genes that were included in any two biomarkers. Underlined numbers represent the number of genes per biomarker. Red indicates high overlaps and blue indicates no overlap. The “Sum” row shows the total number of overlaps with other biomarkers while the “Unique” row shows the number of unique genes that overlap. Panel B shows the C-index evaluation of NCA biomarkers (rows) across cancer datasets (columns). Underlined numbers represent the biomarkers evaluated within the cancer dataset. Red indicates high values within the cancer dataset (column) and blue indicates low values. Boldface and framed values represent significant predictions using 10,000 random models of the same length. The “Average” row shows the average C-index per cancer type and the “Average” column shows the mean C-index per biomarker. Panel C shows the NCA biomarkers (horizontal) evaluated in all datasets using C-index (vertical axis). The mean is shown as a horizontal line. Panel D shows cancer types (horizontal) evaluated with all biomarkers using C-index (vertical axis).
Figure 3The multi-NCA biomarker identified when all databases were combined.
Panel A shows the genes and network identified. The connections correspond to data from the PPI database used. The most connected genes were PRKACA, ESR1, LRP1, SMAD2 and JUN. Panel B shows the risk group prediction (splitting the prognostic index by the median) of the multi-NCA biomarker across cancer datasets. Panel C depict the color-coded differential expression of genes between risk groups. Darker red indicates more significant differences. The scales were estimated in -Log10 of the t test p value. Only p values <0.01 are highlighted. Darker purple indicates more significant hazard ratio associations within the Cox model. The scales were expressed in -Log10 of the Z p value. Only p values <0.05 are highlighted. Panel D shows, in the top, the curated biological terms and pathways associated with the genes composing the biomarker. The associations of genes with specific cancers based on the literature are shown at the bottom.
Cox model results showing how well the multi-NCA cancer biomarker fit across datasets.
| BLCA | 1.00 | 6.3 | 3.3 | 0 | 0 |
| BRCA | 0.80 | 9.4 | 7.8 | 11 | 17 |
| COADREAD | 1.00 | 4.7 | 2.5 | 7 | 8 |
| GBM | 0.65 | 5.8 | 5.7 | 8 | 19 |
| HNSC | 0.75 | 7.7 | 7.0 | 9 | 15 |
| KIRC | 0.74 | 8.7 | 8.1 | 5 | 21 |
| LAML | 0.77 | 11 | 9.6 | 9 | 10 |
| LUAD | 0.81 | 5.1 | 4.7 | 13 | 16 |
| LUSC | 0.77 | 8.5 | 7.4 | 13 | 9 |
| OV | 0.66 | 9.2 | 5.8 | 11 | 18 |
| UCEC | 0.92 | 5.7 | 3.5 | 19 | 13 |
Figure 4Evaluation of all biomarkers in SurvExpress using C-index.
PGC biomarker derived from other authors is not shown (0.74 for all datasets and 0.81 for per tissue) to emphasize biomarkers with higher C-index values.