| Literature DB >> 26253137 |
Mark D M Leiserson1,2, Hsin-Ta Wu3,4, Fabio Vandin5,6, Benjamin J Raphael7,8.
Abstract
Cancer is a heterogeneous disease with different combinations of genetic alterations driving its development in different individuals. We introduce CoMEt, an algorithm to identify combinations of alterations that exhibit a pattern of mutual exclusivity across individuals, often observed for alterations in the same pathway. CoMEt includes an exact statistical test for mutual exclusivity and techniques to perform simultaneous analysis of multiple sets of mutually exclusive and subtype-specific alterations. We demonstrate that CoMEt outperforms existing approaches on simulated and real data. We apply CoMEt to five different cancer types, identifying both known cancer genes and pathways, and novel putative cancer genes.Entities:
Mesh:
Year: 2015 PMID: 26253137 PMCID: PMC4531541 DOI: 10.1186/s13059-015-0700-7
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1a Alteration matrices illustrating differences between the combinatorial weight function W(M) introduced in Dendrix and the probabilistic score Φ(M) used in CoMEt. Both matrices contain 4 mutually exclusive alterations whose alteration frequencies are indicated inside each bar. The samples without alterations are not shown in either matrix. Since both sets are exclusive and have the same total alteration frequency, the Dendrix weight function does not distinguish between these sets. Sets like M (blue) are common in cancer genome studies which often have a small number of recurrently mutated genes and a long tail of rarely mutated genes. The score used in CoMEt conditions on the observed frequencies of each alteration, giving more significance to the set M ′ (green). b An example of 2×2×2 contingency table X for the set M={m 1,m 2,m 3}, illustrating how samples are cross-classified into exclusive, co-occurring, or absent for each alteration. The test statistic ϕ(M) used by CoMEt is the sum of the highlighted exclusive cells
Fig. 2Overview of the CoMEt algorithm. First, we transform alteration data from different measurements into a binary alteration matrix A. Second, we use a Markov chain Monte Carlo (MCMC) algorithm to sample collections M, containing t sets of k alterations, in proportion to the weight Φ(M)−. Here we show a collection containing sets M and M ′ with three and two alterations, respectively. We identify all collections whose weight exceeds the maximum observed in randomly permuted datasets. We summarize the alterations in these significant collections with a marginal probability graph, whose edge weights indicate the fraction of significant collections with the corresponding pair of alterations. Finally, we remove low-weight edges in the graph, obtaining the output modules
Fig. 3Comparison of CoMEt with other methods on simulated data with n=500 samples. a The average F-measure of each method over 25 simulated datasets with varying coverage of the implanted pathway: CoMEt (blue), mutex (black), muex (brown), and Dendrix (red). b Comparison of CoMEt and Multi-Dendrix in identifying an implanted collection containing multiple sets of alterations. Bars indicate average of adjusted Rand index between reported and implanted collection across 25 simulated datasets
Fig. 4CoMEt results on TCGA AML consisting of four modules. Each circle represents the alterations in a gene or genomic region. The number in the circle indicates the number of samples in which the alteration occurs. Black lines are edges in the marginal probability graph with indicated probabilities. Orange polygons indicate the sets in the collection M with the most significant value Φ(M). Below each most significant set (orange) are the corresponding score Φ and coverage
Fig. 5CoMEt results on TCGA GBM. a Two output modules from CoMEt are shown in the same style as in Fig. 4. Characters in parentheses following gene name indicate copy number aberrations: (D) is a deletion, and (A) is an amplification. b Different splice variants of CDKN2A are part of both the Rb signaling (left) and p53 signaling (right) pathways. CoMEt recovers this relationship as two separate mutually exclusive gene sets. The gene sets {RB1, CDK4} and {MDM2, TP53} have a statistically significant number of co-occurring mutations (P=6×10−21, dotted orange line), which is much more significant than the co-occurrence between pairs of genes in these sets (dotted red lines with corresponding P-values)
Fig. 6CoMEt results on (a) TCGA STAD subtypes, (b) TCGA BRCA subtypes. Style is the same as in Fig. 5, except for the addition of subtype alterations (brown) and additional characters in parentheses following gene name: (AS) is an alternative splicing event, and (F) is a fusion gene. Note that an edge between a subtype (brown vertex) and an alteration indicates that the alteration occurs frequently in the subtype
Comparison of CoMEt, Multi-Dendrix, and mutex on the TCGA GBM dataset from the TCGA Pan-Cancer project [5] with and without mutation filtering. The consensus modules output by each algorithm are shown for the dataset with and without mutation filtering. The (A) and (D) following the gene names indicate amplifications and deletions, respectively
| Algorithm | Without filtering | With filtering |
|---|---|---|
| CoMEt | ||
| 1. | 1. | |
| 2. | 2. | |
| 3. | 3. | |
| 4. | 4. | |
| Multi-Dendrix | ||
| 1. | 1. | |
| 2. | ||
| mutex | 1. | 1. |
| 2. |
Bolded genes indicate differences in output with and without mutation filtering