| Literature DB >> 31730267 |
F Graeme Frost1, Praveen F Cherukuri1,2,3, Samuel Milanovich2,3,4, Cornelius F Boerkoel1.
Abstract
Numerous genetic and epigenetic alterations cause functional changes in cell biology underlying cancer. These hallmark functional changes constitute potentially tissue-independent anticancer therapeutic targets. We hypothesized that RNA-Seq identifies gene expression changes that underly those hallmarks, and thereby defines relevant therapeutic targets. To test this hypothesis, we analysed the publicly available TCGA-TARGET-GTEx gene expression data set from the University of California Santa CruzToil recompute project using WGCNA to delineate co-correlated 'modules' from tumour gene expression profiles and functional enrichment of these modules to hierarchically cluster tumours. This stratified tumours according to T cell activation, NK-cell activation, complement cascade, ATM, Rb, angiogenic, MAPK, ECM receptor and histone modification signalling. These correspond to the cancer hallmarks of avoiding immune destruction, tumour-promoting inflammation, evading growth suppressors, inducing angiogenesis, sustained proliferative signalling, activating invasion and metastasis, and genome instability and mutation. This approach did not detect pathways corresponding to the cancer enabling replicative immortality, resisting cell death or deregulating cellular energetics hallmarks. We conclude that RNA-Seq stratifies tumours along some, but not all, hallmarks of cancer and, therefore, could be used in conjunction with other analyses collectively to inform precision therapy.Entities:
Keywords: RNA-seq; cancer; hallmarks of cancer; pan-cancer; precision medicine; precision oncology; transcriptome; wgcna
Year: 2019 PMID: 31730267 PMCID: PMC6933344 DOI: 10.1111/jcmm.14746
Source DB: PubMed Journal: J Cell Mol Med ISSN: 1582-1838 Impact factor: 5.310
Characteristics of the data sets used in this study
| Data set | Abbreviated name | Sample types | Number of samples | Number of primary sites | Number of genes |
|---|---|---|---|---|---|
| TcgaTargetGtex_rsem_gene_tpm | TTG | Tumour and normal | 19,109 | 47 | 60,498 |
| TCGA‐TARGET‐GTEx_coding | TTG‐C | Tumour and normal | 19,109 | 47 | 19,672 |
| TTG_coding_common_primary | TTG‐C‐PS | Tumour and normal | 12,166 | 15 | 19,672 |
| Tumour_coding_common_primary | T‐C‐PS | Tumour | 7,272 | 15 | 19,672 |
| Normal_coding_common_primary | N‐C‐PS | Normal | 4,894 | 15 | 19,672 |
The 60,498 ‘genes’ in the TcgaTargetGtex_gene_expected_count data set includes various species of non‐coding RNAs and pseudogenes which have unique Ensembl ENSG identifiers.
Figure 1A flowchart depicting the analyses used in this study. Transcriptome profiles were first restricted to protein‐coding genes (A), then two different primary site‐correction approaches were taken to analyse the three data sets in parallel (B). Each data set was analysed using WGCNA to identify groups of genes (modules) that were co‐correlated, and variable across cancers (C). Genes found in modules were put through pathway enrichment analysis (WebGestalt) and used for hierarchical clustering (D)
Figure 2Heat maps of module expression within cancer clusters. Heat maps are shown for (A) uncorrected, (B) tissue‐corrected and (C) grand mean‐corrected RNA‐Seq data. WGCNA‐identified modules (left colour bar) are composed of protein‐coding genes with TPM values co‐correlated and variable across cancers. For panel A, module expression is the mean of TPM + 1 values for all genes within a module. For panels B and C, module expression is ln(Tumour/Normal) as defined in the Methods. Clusters of similar tumours (numbered divisions across the top) were defined by hierarchical clustering using the cosine distance between the genes included in the modules and Ward's method for agglomeration. The anatomical primary sites of tumours are graphically portrayed by the colour bar along the top
Figure 3Graphs showing the distribution of each cancer type among clusters for (A) uncorrected, (B) tissue‐corrected and (C) grand mean‐corrected RNA‐Seq data. Count represents the number of tumours. Colours within the bars represent the cancer anatomical primary sites as described in the key
Figure 4Heat maps of the module expression for each anatomical primary site (colour bar along the top).Heat maps are shown for (A) uncorrected, (B) tissue‐corrected and (C) grand mean‐corrected RNA‐Seq data. A heat map showing expression of modules (vertical colour bar) identified by WGCNA for each anatomical primary site (horizontal colour bar). WGCNA‐identified modules (left colour bar) are composed of protein‐coding genes with TPM values co‐correlated and variable across cancers. For panel A, module expression is the mean of TPM + 1 values for all genes within a module. For panels B and C, module expression is ln(Tumour/Normal) as defined in the Methods
The number of clusters with more tumours from a primary site than expected by chance as determined by the hypergeometric test
| Data set | Bladder | Brain | Breast | Colon | Oesophagus | Kidney | Liver | Lung | Ovary | Pancreas | Prostate | Skin | Stomach | Testis | Uterus |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| UC | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 |
| TC | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| GC | 1 | 1 | 2 | 1 | 2 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Abbreviations: UC, uncorrected data; TC, tissue‐corrected data; GMC, grand mean‐corrected data