| Literature DB >> 25573922 |
Keegan D Korthauer1, Christina Kendziorski1.
Abstract
MOTIVATION: Identifying and prioritizing somatic mutations is an important and challenging area of cancer research that can provide new insights into gene function as well as new targets for drug development. Most methods for prioritizing mutations rely primarily on frequency-based criteria, where a gene is identified as having a driver mutation if it is altered in significantly more samples than expected according to a background model. Although useful, frequency-based methods are limited in that all mutations are treated equally. It is well known, however, that some mutations have no functional consequence, while others may have a major deleterious impact. The spatial pattern of mutations within a gene provides further insight into their functional consequence. Properly accounting for these factors improves both the power and accuracy of inference. Also important is an accurate background model.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25573922 PMCID: PMC4426832 DOI: 10.1093/bioinformatics/btu858
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Counts of samples with mutation by position and type for TCGA ovarian and COSMIC (Catalogue Of Somatic Mutations In Cancer) datasets. The left panel displays the ten genes with the lowest entropy in COSMIC (putative oncogenes) that have at least one mutation in TCGA ovarian. The right panel displays the ten genes with the highest proportion of truncating mutations (putative tumor-suppressor genes) that have at least one mutation in TCGA ovarian. Blue represents missense mutations and red represents a location with at least one truncating mutation. Each vertical bar spans five amino acids and darker colors correspond to more mutations. For genes with more than 500 mutations, a random sample of 500 was plotted, and positions with more than 25 mutations are given the same color intensity as those with 25 mutations
Summary of features of methods to identify driver genes
| Methods | Mutation Type | Frequency | Gene-specific Background | Functional Impact | Spatial Patterning |
|---|---|---|---|---|---|
| MADGiC | ✓ | ✓ | ✓ | ✓ | ✓ |
| MuSiC | ✓ | ✓ | |||
| YS | ✓ | ✓ | ✓ | ||
| MutSigCV | ✓ | ✓ | ✓ | ||
| OncodriveFM | ✓ | ✓ | |||
| OncodriveCLUST | ✓ | ✓ |
Fig. 2.Mutation rate is shown to depend significantly on replication timing region and expression level. Specifically, mutation rate is shown for three replication timing regions (top) and for three levels of expression (bottom) for four types of mutations in TCGA ovarian data. Within each mutation type, Chi-Square tests of mutation counts stratified by replication timing or expression level categories were found to be significant (P < 0.05)
Simulation results
| MADGiC | |||||||
|---|---|---|---|---|---|---|---|
| MutSigCV | YS | No FI | SIFT | Ideal FI | |||
| SIM I | Ovary | Power | 0.05 | 0.30 | 0.42 | 0.51 | 0.86 |
| FDR | 0.04 | 0.04 | 0.04 | 0.04 | 0.02 | ||
| Lung | Power | 0.01 | 0.16 | 0.27 | 0.31 | 0.75 | |
| FDR | 0.07 | 0.08 | 0.07 | 0.06 | 0.03 | ||
| SIM II | Ovary | Power | 0.06 | 0.33 | 0.45 | 0.55 | 0.86 |
| FDR | 0.02 | 0.32 | 0.08 | 0.09 | 0.04 | ||
| Lung | Power | 0.02 | 0.36 | 0.30 | 0.34 | 0.77 | |
| FDR | 0.58 | 0.97 | 0.32 | 0.30 | 0.05 | ||
Note: Power and FDR averaged over 100 SIM I datasets, where dependence of mutation rate on replication timing and expression level is ignored and 100 SIM II datasets, where this dependence is preserved. The first set of simulations was designed to mimic TCGA ovarian data, which has a relatively large sample size, an average number of mutations and relatively little variability among sample-specific mutation rates; the second set is based on TCGA lung data, with smaller sample size, larger number of mutations and greater heterogeneity in sample-specific mutation rates.
Case study results
| Oncodrive | ||||||
|---|---|---|---|---|---|---|
| MADGiC | YS | MutSigCV | FM | CLUST | ||
| Ovary | Total found | 19 | 70 | 5 | 21 | 20 |
| Put. driver fraction | 0.579 | 0.129 | 0.400 | 0.381 | 0.250 | |
| Lung | Total found | 47 | 585 | 7 | 85 | 55 |
| Put. driver fraction | 0.213 | 0.019 | 0.571 | 0.153 | 0.145 | |
Note: For each method applied to the two case studies (TCGA ovarian and lung), we report the total number of driver genes identified, along with the proportion of those found that are putative drivers [i.e. they are on the list identified by Vogelstein ].
Fig. 3.The proportion of driver genes identified by each method in each replication timing (top) and expression level (bottom) category for the TCGA ovarian case study. Refer to Supplementary Section S5 for similar results from the lung case study