| Literature DB >> 29158875 |
Mary Qu Yang1,2, Dan Li1,2, William Yang3, Yifan Zhang1,2, Jun Liu4, Weida Tong5.
Abstract
Clear cell renal cell carcinoma (ccRCC) is the most common and most aggressive form of renal cell cancer (RCC). The incidence of RCC has increased steadily in recent years. The pathogenesis of renal cell cancer remains poorly understood. Many of the tumor suppressor genes, oncogenes, and dysregulated pathways in ccRCC need to be revealed for improvement of the overall clinical outlook of the disease. Here, we developed a systems biology approach to prioritize the somatic mutated genes that lead to dysregulation of pathways in ccRCC. The method integrated multi-layer information to infer causative mutations and disease genes. First, we identified differential gene modules in ccRCC by coupling transcriptome and protein-protein interactions. Each of these modules consisted of interacting genes that were involved in similar biological processes and their combined expression alterations were significantly associated with disease type. Then, subsequent gene module-based eQTL analysis revealed somatic mutated genes that had driven the expression alterations of differential gene modules. Our study yielded a list of candidate disease genes, including several known ccRCC causative genes such as BAP1 and PBRM1, as well as novel genes such as NOD2, RRM1, CSRNP1, SLC4A2, TTLL1 and CNTN1. The differential gene modules and their driver genes revealed by our study provided a new perspective for understanding the molecular mechanisms underlying the disease. Moreover, we validated the results in independent ccRCC patient datasets. Our study provided a new method for prioritizing disease genes and pathways.Entities:
Keywords: AUC, Area Under Curve; Causative mutation; DEG, Differentially expressed gene; DGM, Differential gene module; Gene module; KEGG, Kyoto Encyclopedia of Genes and Genomes; Pathways; Protein-protein interaction; RCC, Renal cell cancer; ROC, Receiver Operating Characteristic; SVM, Support vector machine; TCGA, The Cancer Genome Atlas; ccRCC; ccRCC, Clear cell renal cell carcinoma; eQTL; eQTL, Expression quantitative trait loci
Year: 2017 PMID: 29158875 PMCID: PMC5683705 DOI: 10.1016/j.csbj.2017.09.003
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1The procedure of our study. After the differentially expressed gene modules were identified by coupling PPI with gene expression, somatic mutations were linked with the DGMs using eQTL analysis. Here, SMA-DGM refers to somatic mutations associated the DGMs.
The performance of DGM and DEG based hierarchy clustering and SVM classifiers on the TCGA ccRCC patient group and three independent ccRCC datasets.
| ccRCC patient cohorts | Normal | Tumor | Misclustered tissue samples | AUC of the classifiers | ||
|---|---|---|---|---|---|---|
| DGM-based | DEG-based | DGM-based | DEG-based | |||
| TCGA-ccRCC | 72 | 539 | 2 | 4 | ||
| GSE36895 | 23 | 29 | 0 | 0 | 0.923 | 1.0 |
| GSE46699 | 63 | 67 | 9 | 15 | 0.953 | 0.949 |
| GSE40435 | 101 | 101 | 0 | 0 | 0.956 | 0.997 |
The DGM based classifier significantly outperformed the DGE based classifier by 22.8%((0.942 -0.767) / 0.767, Table 1 bold number) on the TCGA-ccRCC which is an imbalanced data set (72 normal vs 539 tumor samples).
Fig. 2The performance comparisons of clustering and classification based on DGMs and DEGs. (A) Hierarchical clustering of TCGA 539 ccRCC and 72 normal tissues based on the expression of DGMs (top) and DEGs (bottom). (B) Hierarchical clustering of an independent ccRCC (GSE46699) tumor and normal tissues based on DGMs (top) and DEGs (bottom). (C) The ROC curves of the classifiers using the expression of DGM and DEG for the TCGA dataset locate at left panel, for GSE46699 locate at right panel.
Fig. 3The examples of differentially expressed gene modules. The genes colored in red were up-regulated, whereas genes colored in green were down-regulated in ccRCC. The intensity of the color is proportioned to log2 fold-change of the gene expression. Circle nodes refer to the expression levels of genes that were significantly changed, whereas diamond nodes refer to the genes without significantly altered expression levels. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
A total of 188 DGMs were significantly associated with five or less mutated genes (FDR < 0.0001).
| Num. of mutated gene(s) associated with each DGM | Num. of associated DGMs | Mutated genes associated with the DGMs |
|---|---|---|
| 1 | 116 | |
| 2 | 47 | |
| 3 | 16 | |
| 4 | 4 | |
| 5 | 5 |
The number in the parentheses after a gene symbol represents the number of the DGMs that were linked with this mutated gene.
Fig. 4The analysis of the genes harbored significant somatic mutations and were associated with the DGMs. (A) The ccRCC patients with BAP1 somatic mutations had poor five-year survival rate. (B) Five FDA-approved cancer drugs (colored in gold) target at RRM1. (C) The mutation of BAP1 and PBRM1 tend to mutual exclusively at P < 0.03.