| Literature DB >> 26436532 |
Jiarui Ding1,2, Melissa K McConechy3,4, Hugo M Horlings3,4, Gavin Ha1, Fong Chun Chan1, Tyler Funnell1, Sarah C Mullaly1, Jüri Reimand5, Ali Bashashati1, Gary D Bader5, David Huntsman1,3,4, Samuel Aparicio1,4, Anne Condon2, Sohrab P Shah1,2,4,6.
Abstract
We present a novel hierarchical Bayes statistical model, xseq, to systematically quantify the impact of somatic mutations on expression profiles. We establish the theoretical framework and robust inference characteristics of the method using computational benchmarking. We then use xseq to analyse thousands of tumour data sets available through The Cancer Genome Atlas, to systematically quantify somatic mutations impacting expression profiles. We identify 30 novel cis-effect tumour suppressor gene candidates, enriched in loss-of-function mutations and biallelic inactivation. Analysis of trans-effects of mutations and copy number alterations with xseq identifies mutations in 150 genes impacting expression networks, with 89 novel predictions. We reveal two important novel characteristics of mutation impact on expression: (1) patients harbouring known driver mutations exhibit different downstream gene expression consequences; (2) expression patterns for some mutations are stable across tumour types. These results have critical implications for identification and interpretation of mutations with consequent impact on transcription in cancer.Entities:
Mesh:
Year: 2015 PMID: 26436532 PMCID: PMC4600750 DOI: 10.1038/ncomms9554
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
List of the twelve cancer types analysed.
| Data | Mutation | RNASeq | SNP6.0 | Overlap |
|---|---|---|---|---|
| BLCA | 99 | 96 | 125 | 94 |
| BRCA | 772 | 822 | 879 | 743 |
| COAD | 155 | 192 | 414 | 149 |
| GBM | 291 | 167 | 576 | 144 |
| HNSC | 306 | 303 | 306 | 295 |
| KIRC | 417 | 428 | 452 | 390 |
| LAML | 196 | 173 | 197 | 167 |
| LUAD | 230 | 355 | 358 | 169 |
| LUSC | 178 | 220 | 342 | 177 |
| OV | 316 | 266 | 581 | 159 |
| READ | 69 | 71 | 163 | 65 |
| UCEC | 248 | 333 | 493 | 235 |
BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; COAD, colon adenocarcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LAML, acute myeloid leukaemia, also denoted as AML; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; OV, ovarian serous cystadenocarcinoma; READ, rectum adenocarcinoma; UCEC, uterine corpus endometrioid carcinoma.
The numbers are the sample counts. Totally, 563,024 somatic mutations in the overlapped samples (363,676 missense mutations, 132,981 synonymous mutations, 33,838 nonsense mutations, 13,260 frameshift indels, 6,952 non-coding RNA mutations, 8,699 splice site mutations, 3,141 in-frame indels and 477 stop gain mutations). In trans-analysis, we added the 37,308 homozygous deletions in 2,084 genes (focal copy number deletion peaks), and 69,643 amplifications in 960 genes (focal copy number amplification peaks).
Figure 1Overview of the xseq modelling framework.
(a) The inputs to the xseq model: a mutation matrix typically from next-generation sequencing, a gene interaction network and a gene expression matrix. xseq models the expression of a gene across all the patients by mixture distributions. The three mixture components represent downregulation, neutral and upregulation, respectively. (b) The graphical model representation of xseq with the plate notation. Circles represent random variables and arrows denote dependencies between variables. Boxes are plates that represent replicates. For example, the graph represents a gene mutated in M patients (we assume that a gene is mutated only once in a patient), and the gene is connected to N genes. (c) xseq predicts the posterior marginal probabilities of each gene (P(D)), each mutation (P(F)) influencing expression and the regulatory probabilities of the genes connected to the mutated gene in a patient (P(G)).
Figure 2Theoretical performance of xseq on simulated data sets.
Each plot depicts a receiver operating characteristic (ROC) curve, which displays the true positive rate as a function of false-positive rate. (a) The expression of genes that are downregulated, neutral and upregulated is highly discriminative (first row), (b) moderately discriminative (second row) and (c) poorly discriminative (third row, see the enclosed figures, where cyan is downregulation, grey is neutral and red is upregulation, respectively). The ROC curves in the first column, second column and the third column were computed when the degree of dysregulation of the expression of connected genes by mutations was high, moderate and low, respectively.
Figure 3Permutation analysis of the TCGA acute myeloid leukaemia data sets.
(a) Left panel shows the empirical distribution functions of P(D), and the right panel shows the empirical distribution functions of P(F) estimated from different permuted data sets. (b) Heatmap shows the expression of genes connected to RUNX1: red represents high expression and blue represents low expression. Here columns represent patients and rows represent genes. For the patients without RUNX1 mutations, we ‘assume' the mutations still exist and estimate the probabilities of individual mutations P(F). The mutation type ‘complex' of a gene in a patient represents the gene harbouring multiple types of mutations in the patient.
Figure 4The 65 genes harboured loss-of-function mutations with strong cis-effects on the expression of these genes.
(a) The predicted cis-effect loss-of-function mutations across 12 tumour types (P(D)≥0.8 in at least one tumour type). (b) The histograms of posterior marginals of mutations and genes across tumour types. (c) The posterior marginals of mutations separated based on copy number status. (d) The loss-of-function mutations in the 65 cis-effect genes (all-cis), 30 novel predictions (novel cis), 23 cis-effect tumour suppressor genes (TSG-cis), 108 non-cis-effect TSGs (TSG-other) and 30 negative control genes (negative controls) segregated based on copy number status. (e) A ‘novel' tumour suppressor gene AMOT is not significantly mutated based on frequency-based methods, but AMOT is enriched in loss-of-function mutations (tumour suppressor gene probability P(TSG)=0.92). (f) The loss-of-function mutations in STAG2 typically correlate with lower expression, except for a splice donor site mutation GT→GC mutation (both GT and GC are used by the splicing machinery). MuSiC SMG, significantly mutated genes predicted by MuSiC; TF, transcription factor; TSG probability, tumour suppressor gene probability.
Figure 5NFE2L2 mutations and FECH upregulation.
NFE2L2 mutations were predicted to correlate with FECH expression upregulation in five types of cancer: BLCA, HNSC, LUAD, LUSC and UCEC. Each dot in the scatterplots represents the expression of NFE2L2 and FECH in a patient. A blue cross ‘ × ' means the patient does not have NFE2L2 mutations. Other types of symbols represent different kinds of mutations (Hlamp, copy number amplifications). The filled colours encode the estimated mutation probability P(F) from trans-analysis (both FECH expression and the expression of other NFE2L2 interaction partners determine P(F)). HNSC, head and neck squamous cell carcinoma; LUAD, lung adenocarcinoma.
Figure 6Patients harbouring the same gene mutations but with variations in trans-associated gene expression.
(a) In UCEC, CTNNB1 mutations correlated with the upregulation of a set of genes, and downregulation of another set of genes. The most extreme upregulated genes included BMP4 (in TGF-β signalling pathway), NKD1, AXIN2, DKK4 and KREMEN1 (in Wnt signalling pathway). The downregulated genes included Wnt signalling pathway gene FZD5. Here red colour in the heatmap represents gene upregulation and blue colour represents gene downregulation. (b) The smallest unimodality dip-test P values of P(F) of the 127 significantly mutated genes across tumour types. (c) The mutation sites, mutation types and P(F) (filled colours) of CTNNB1 mutations (d) and RB1 mutations in UCEC. MSI, microsatellite instability; MSS microsatellite stable; MSI-H, MSI-high; MSI-L, MSI-low; TGF-β, transforming growth factor-beta.