| Literature DB >> 34899838 |
Rafsan Ahmed1, Cesim Erten2, Aissa Houdjedj2, Hilal Kazan2, Cansu Yalcin2.
Abstract
One of the key concepts employed in cancer driver gene identification is that of mutual exclusivity (ME); a driver mutation is less likely to occur in case of an earlier mutation that has common functionality in the same molecular pathway. Several ME tests have been proposed recently, however the current protocols to evaluate ME tests have two main limitations. Firstly the evaluations are mostly with respect to simulated data and secondly the evaluation metrics lack a network-centric view. The latter is especially crucial as the notion of common functionality can be achieved through searching for interaction patterns in relevant networks. We propose a network-centric framework to evaluate the pairwise significances found by statistical ME tests. It has three main components. The first component consists of metrics employed in the network-centric ME evaluations. Such metrics are designed so that network knowledge and the reference set of known cancer genes are incorporated in ME evaluations under a careful definition of proper control groups. The other two components are designed as further mechanisms to avoid confounders inherent in ME detection on top of the network-centric view. To this end, our second objective is to dissect the side effects caused by mutation load artifacts where mutations driving tumor subtypes with low mutation load might be incorrectly diagnosed as mutually exclusive. Finally, as part of the third main component, the confounding issue stemming from the use of nonspecific interaction networks generated as combinations of interactions from different tissues is resolved through the creation and use of tissue-specific networks in the proposed framework. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/NetCentric.Entities:
Keywords: cancer drivers; cancer genomics; mutual exclusivity; network-centric mutual exclusivity evaluation; tumor mutation load
Year: 2021 PMID: 34899838 PMCID: PMC8664367 DOI: 10.3389/fgene.2021.746495
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Results of network-centric ME evaluation framework with control group COADREAD t20 (498 samples, 196 CGC-CGC pairs).
| Method | Precision | Sensitivity | F1 Score | Precisionstrict | Sensitivitystrict | F1 Scorestrict |
|---|---|---|---|---|---|---|
| DISCOVER | 0.661 | 0.220 | 0.331 | 0.708 | 0.183 | 0.291 |
| DISCOVER Strat | 0.727 | 0.041 | 0.078 | 0.727 | 0.041 | 0.078 |
| Fisher’s Exact Test | 0.500 | 0.031 | 0.058 | 0.500 | 0.031 | 0.058 |
| MEGSA | 0.611 | 0.056 | 0.103 | 0.588 | 0.051 | 0.094 |
| MEMO | 0.658 | 0.329 | 0.439 | 0.647 | 0.237 | 0.347 |
| WExT | 0.676 | 0.403 | 0.505 | 0.725 | 0.329 | 0.453 |
Results of network-centric ME evaluation framework with control group COADREAD t20 (498 samples, 107 CGC-CGC pairs).
| Method | Precision | Sensitivity | F1 Score | Precisionstrict | Sensitivitystrict | F1 Scorestrict |
|---|---|---|---|---|---|---|
| DISCOVER | 0.537 | 0.276 | 0.365 | 0.579 | 0.210 | 0.308 |
| DISCOVER Strat | 0.455 | 0.048 | 0.086 | 0.400 | 0.038 | 0.069 |
| Fisher’s Exact Test | 0.444 | 0.038 | 0.069 | 0.375 | 0.028 | 0.052 |
| MEGSA | 0.571 | 0.075 | 0.133 | 0.538 | 0.066 | 0.118 |
| MEMO | 0.566 | 0.388 | 0.460 | 0.495 | 0.215 | 0.300 |
| WExT | 0.575 | 0.438 | 0.497 | 0.596 | 0.295 | 0.395 |
Results of network-centric ME evaluation framework with control group COADREAD t5 (498 samples, 1748 CGC-CGC pairs).
| Method | Precision | Sensitivity | F1 Score | Precisionstrict | Sensitivitystrict | F1 Scorestrict |
|---|---|---|---|---|---|---|
| DISCOVER | 0.647 | 0.052 | 0.096 | 0.658 | 0.046 | 0.086 |
| DISCOVER Strat | 0.618 | 0.012 | 0.024 | 0.618 | 0.012 | 0.024 |
| Fisher’s Exact Test | 0.583 | 0.008 | 0.016 | 0.565 | 0.007 | 0.014 |
| WExT | 0.645 | 0.121 | 0.203 | 0.668 | 0.102 | 0.177 |
Results of network-centric ME evaluation framework with control group COADREAD t5 (498 samples, 1625 CGC-CGC pairs).
| Method | Precision | Sensitivity | F1 Score | Precisionstrict | Sensitivitystrict | F1 Scorestrict |
|---|---|---|---|---|---|---|
| DISCOVER | 0.721 | 0.052 | 0.097 | 0.746 | 0.048 | 0.090 |
| DISCOVER Strat | 0.641 | 0.013 | 0.025 | 0.641 | 0.013 | 0.025 |
| Fisher’s Exact Test | 0.619 | 0.008 | 0.016 | 0.619 | 0.008 | 0.016 |
| WExT | 0.670 | 0.118 | 0.200 | 0.712 | 0.103 | 0.180 |
FIGURE 1Comparison of mutual exclusivity results of DISCOVER and DISCOVER Strat on TCGA COADREAD cohort (498 samples) (A) The scatterplot of percentage significance of ME runs (p-value 0.05) of DISCOVER on COADREAD data where tests are performed between a CGC gene and a random subset of other CGC genes so that ME of a CGC gene of interest is checked with same sized group of genes in both A and B. (B) The scatter plot of percentage significance of ME runs of DISCOVER where tests are performed between a CGC gene and its PPI neighbors that are in CGC (red) compared with (A) in gray. (C) The scatterplot of percentage significance of mutual exclusivity runs of DISCOVER Strat where tests are performed between a CGC gene and a random subset of other CGC genes (blue) so that ME of a CGC gene of interest is checked with same sized group of genes in both C and D, results from (A) are shown in gray for comparison. (D) The scatterplot of percentage significance of mutual exclusivity runs of DISCOVER Strat where tests are performed between a CGC gene and its PPI neighbors that are in CGC (red) compared with (C) in blue.
FIGURE 2Waterfall plots of the distribution of mutations for selected gene pairs. (A) Mutation distribution of two selected gene pairs (BRAF-NRAS and SMAD4-SMAD3) that are found to be significantly mutually exclusive based on both DISCOVER and DISCOVER-Strat estimations. (B) Mutation distribution of two selected gene pairs (APC-NUP98 and KRAS-PDE4DIP) that are found to be significantly mutually exclusive based on DISCOVER but not based on DISCOVER-Strat. Note that the set of samples included in each plot is determined by finding the set of patients that have a mutation in at least one of the listed genes. GenVisR R package is used to generate the waterfall plots (Skidmore et al., 2016). Subtype information is downloaded from (Guinney et al., 2015).
FIGURE 3Performance of selected ME tests in terms of discriminating TSN and non-TSN gene pairs based on estimated ME p-values on COADREAD data. Blue curve is plotted with CGC gene pairs and red curve is plotted with non-CGC gene pairs. Mutual exclusivities are estimated with (A) DISCOVER, (B) DISCOVER Strat, (C) Fisher’s Exact Test, (D) MEGSA, (E) MEMo and (G) WeXT respectively.
FIGURE 4The number of recovered CGC genes for the original MEXCOwalk as well as for its modified versions where mutual exclusivity values are estimated with DISCOVER, Fisher’s Exact Test and WeXT. COADREAD dataset is used with t = 5 setting. The numbers in parentheses indicate the area under the ROC curve for the corresponding curve.