| Literature DB >> 29200198 |
Heiko Horn1,2, Michael S Lawrence2,3, Candace R Chouinard2, Yashaswi Shrestha2, Jessica Xin Hu1,2, Elizabeth Worstell1,2, Emily Shea2, Nina Ilic2,4, Eejung Kim2,4, Atanas Kamburov2,3, Alireza Kashani1,2, William C Hahn2,4, Joshua D Campbell2,5, Jesse S Boehm2, Gad Getz2,3, Kasper Lage1,2,6.
Abstract
Methods that integrate molecular network information and tumor genome data could complement gene-based statistical tests to identify likely new cancer genes; but such approaches are challenging to validate at scale, and their predictive value remains unclear. We developed a robust statistic (NetSig) that integrates protein interaction networks with data from 4,742 tumor exomes. NetSig can accurately classify known driver genes in 60% of tested tumor types and predicts 62 new driver candidates. Using a quantitative experimental framework to determine in vivo tumorigenic potential in mice, we found that NetSig candidates induce tumors at rates that are comparable to those of known oncogenes and are ten-fold higher than those of random genes. By reanalyzing nine tumor-inducing NetSig candidates in 242 patients with oncogene-negative lung adenocarcinomas, we find that two (AKT2 and TFDP2) are significantly amplified. Our study presents a scalable integrated computational and experimental workflow to expand discovery from cancer genomes.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29200198 PMCID: PMC5985961 DOI: 10.1038/nmeth.4514
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1NetSig predicts true cancer genes
a) Areas under the receiver operating characteristics curve (AUCs) for genes in the ‘Cosmic classic’ and ‘recently emerging’ sets are 0.86 and 0.75, respectively (adj. P < 0.05). Genes from the random set fit the null hypothesis (AUC 0.49, nominal P = 0.75). b) AUCs when removing the effect of very well established cancer genes is 0.79, 0.73, and 0.5, for the “Cosmic classic”, “recently emerging”, and random sets, respectively. c) Visualizing the NetSig500 set. Genes are represented as individual dots and plotted along the x-axis by the NetSig Q value from the most significant of 21 tumor types, and on the y-axis by the NetSig Q value when 4,724 tumors are analyzed as a combined pan-cancer cohort. Significance at FDR Q <= 0.1 is indicated on each axis by grey lines.
Figure 2In vivo tumor formation of NetSig5000 and control sets
a) Experimental design. b) Tumorigenic potential of 23 NetSig5000 genes (NetSig candidates), 25 known oncogenes (Positive control), and 79 random genes (Random controls) in in vivo mouse tumorigenesis experiments. X-axis indicates maximum proliferation rate and y-axis maximum significance of enrichment in tumors relative to pre-injection samples. Dark grey boxes indicate one standard deviation from the median (lower confidence) and light grey boxes two standard deviations from the mean (higher confidence). c) Candidates that induce tumors at the higher and lower confidence threshold stratified by cell model. d) Proportion of the NetSig5000 candidates, positive control set, and random set, respectively, that induced tumors in mice. Left panel indicate the results at the level of cDNA constructs. Right panel indicate results at the gene level.
Figure 3Targeted re-analysis of oncogene negative lung adenocarcinoma patients
a) Amplification of the nine genes that induce tumors in the lung adenocarcinoma-relevant cell model. As a group the genes are significantly amplified (P = 7.0e-3) and AKT2 and TFDP2 are individually significantly amplified (FDR Q < 0.1). b), c) In depth view of the amplified regions surrounding AKT2 and TFDP2, respectively. d), e) The proportion of oncogene positive or negative patients with −1, 0, 1, or 2 copy number changes of AKT2 or TFDP2. f), g) NetSig networks of AKT2 and TFDP2. Nodes other than AKT2 and TFDP2 are colored by the significance of the pan-cancer Q value of the corresponding gene, where light grey represents Q close to 1 and red Q << 1, with darker red representing more significant Q values as indicated below the relevant node.