| Literature DB >> 33859763 |
Vu Viet Hoang Pham1, Lin Liu1, Cameron Bracken2,3, Gregory Goodall2,3, Jiuyong Li1, Thuc Duy Le1.
Abstract
Identifying the genes responsible for driving cancer is of critical importance for directing treatment. Accordingly, multiple computational tools have been developed to facilitate this task. Due to the different methods employed by these tools, different data considered by the tools, and the rapidly evolving nature of the field, the selection of an appropriate tool for cancer driver discovery is not straightforward. This survey seeks to provide a comprehensive review of the different computational methods for discovering cancer drivers. We categorise the methods into three groups; methods for single driver identification, methods for driver module identification, and methods for identifying personalised cancer drivers. In addition to providing a "one-stop" reference of these methods, by evaluating and comparing their performance, we also provide readers the information about the different capabilities of the methods in identifying biologically significant cancer drivers. The biologically relevant information identified by these tools can be seen through the enrichment of discovered cancer drivers in GO biological processes and KEGG pathways and through our identification of a small cancer-driver cohort that is capable of stratifying patient survival. © The author(s).Entities:
Keywords: cancer driver; cancer driver discovery; coding gene; computational method; microRNA
Mesh:
Year: 2021 PMID: 33859763 PMCID: PMC8039954 DOI: 10.7150/thno.52670
Source DB: PubMed Journal: Theranostics ISSN: 1838-7640 Impact factor: 11.556
Figure 1Cancer drivers and genes with mutations. Genes with driver mutations are cancer drivers. Some genes which do not contain mutations but regulate driver mutations to develop cancer are also considered as cancer drivers.
Figure 2Categorisation of cancer driver discovery methods. The methods are categorised in three groups: Single cancer driver identification, Cancer driver module identification, and Personalised cancer driver identification. Single cancer driver identification includes two sub-groups: Mutation-based methods and Network-based methods. Mutation-based methods discover cancer drivers using mutation significance, functional impact of mutations, etc. Most cancer driver module identification methods use the mutual exclusivity of mutations to identify modules of cancer drivers.
Summary of methods for identifying single cancer drivers
| Method | Description and reference | Additional information |
|---|---|---|
| MutSigCV | Assesses the significance of mutations in DNA sequencing to discover cancer driver genes | The result includes false positives (i.e. passenger mutations with a high degree) |
| OncodriveFM | Uses the functional impact of mutations of genes to detect cancer drivers with the hypothesis that any bias of variations with a significantly functional impact in genes can be used to identify candidate driver genes | It can identify driver genes with low mutation recurrence |
| OncodriveFML | Uses the functional impact of gene mutations to reveal both coding and non-coding drivers | It is applied to 19 cancer datasets and detects several well-known drivers |
| DriverML | Uses the functional impact of mutations to unravel cancer drivers through a supervised machine learning approach | It can be improved if integrating additional well-annotated datasets (e.g. CGC) into the training data |
| ActiveDriver | Looks at the enrichment of mutations in externally defined regions to uncover cancer driver genes | It only analyses missense mutations while other mutations are also important such as in frame del, frame shift del, etc. |
| SGDriver | Uses a Bayes inference statistical framework to incorporate somatic missense mutations into protein-ligand binding-site residues in order to figure out the functional role of the mutations | It can be improved if integrating more mutation types and using molecular network to identify the interacting partners of mutated proteins to expand the candidate pool |
| AlloDriver | Maps mutations to allosteric/orthosteric sites derived from the three-dimensional protein structures to detect potentially functional genes/proteins in cancer patients | It also uses only missense mutations |
| OncodriveCLUST | Detects cancer genes with a large bias in clustering mutations based on the idea that gain-of-function mutations usually cluster in particular protein sections and these mutations contribute to the development of cancer cells | It cannot identify cancer drivers whose mutations are distributed across the sequence |
| IntOGen-mutations | Uses somatic mutations, gene expression, and tumour pathways to identify cancer drivers for various tumour types by combining OncodriveFM and OncodriveCLUST | It can discover driver mutations which are distributed across the sequence and have significant functional impacts |
| PathScan | Combines genomic mutations with the information of genes in known pathways to uncover cancer driver genes | It can be extended to integrate other types of genetic anomalies |
| Sakoparnig et al. | Introduces a computational method to detect genomic alterations with low occurrence frequencies based on mutation timing | It may not discover drivers which are already present at very early cancer stages as we cannot observe a steep rise for them |
| CONEXIC | Applies a score-guided search to detect combinations of modulators which reflect the expression of a gene module in a set of tumour samples then it identifies those which have the highest score in amplified or deleted regions | It is mainly bases on copy number aberrations |
| ncDriver | Screens non-coding mutations with conservations and cancer specificity to reveal non-coding cancer drivers | It tests both recurrence and distribution of mutations to identify cancer drivers |
| HotSpot3D | Identifies spatial hotspots to interpret the function of mutations in the encoded protein | It can detect rare cancer drivers |
| 3D clusters | Clusters somatic mutations in cancer to identify rare mutations based on 3D protein structures | It is limited due to the lack of complete protein structure data for several genes |
| Vinayagam et al. | Applies controllability analysis on the directed network of human protein-protein interaction to identify disease genes | As it uses a general protein network (i.e. not specific for a cancer type), uncovered drivers are not particular for any cancer type |
| CBNA | Identifies coding and miRNA cancer drivers by analysing the controllability of the miRNA-TF-mRNA network and mutation data | It builds the gene network for a specific cancer type, thus the results are for the cancer type of interest |
| DriverNet | Uncovers cancer drivers by evaluating the influence of mutations on transcriptional networks in cancer | It relies on a predetermined influence graph which is sparse and incomplete |
Summary of methods for identifying cancer driver modules
| Method | Description and reference | Additional information |
|---|---|---|
| CoMEt | Identifies cancer genes by using the exact statistical test to test mutual exclusivity of genomic events and applies techniques to do simultaneous analysis for mutually exclusive alterations | It has a low computational complexity |
| WeSME | Discovers cancer drivers by evaluating the mutual exclusivity of mutations of gene pairs | It can only detect driver gene pairs (i.e. only two driver genes in each module) |
| MEMo | Analyses mutual exclusivity of mutated genes in subnetworks to identify mutual exclusivity modules in cancer | It depends on the prior biological knowledge of gene interactions |
| iMCMC | Uses the cancer genomic data including mutations, CNAs, and gene expression from cancer patients to identify mutated core modules in cancer | It provides flexibility by using two input parameters to balance different sources of data |
| NetBox | Uses biological networks to assess network modules statistically and identify core pathways in GBM | It is only used for Glioblastoma |
| TieDIE | Applies network diffusion to discover the relationship of genomic events and changes in cancer subtypes | It has a high computational cost |
| CICERO | Uses RNA sequencing data and extensive annotation to detect driver fusions with a local assembly-based algorithm | It may miss low-expressed gene fusions |
| Hamilton et al. | Uses the pan-cancer dataset of TCGA and the miRNA target data of AGO-CLIP to detect a pan-cancer oncogenic miRNA superfamily with a central core seed motif | It discovers a miRNA driver superfamily consisting of |
Summary of methods for identifying personalised cancer drivers
| Method | Description and reference | Additional information |
|---|---|---|
| DawnRank | A ranking framework which applies PageRank to evaluate the impact of genes in an interaction network to detect cancer drivers | It bases on the same gene network for all patients, thus may reduce the personalised information |
| SCS | Detects the minimal set of mutated genes controlling the maximal differentially expressed genes as cancer drivers | It builds a gene network for each patient; its application is limited as it requires the corresponding normal sample for each patient |
| PNC | Identifies cancer drivers as the minimum gene set which covers all the edges based on a bipartite graph | It also requires the corresponding normal sample for each patient |
Figure 3Comparison of F of ActiveDriver, DawnRank, DriverML, DriverNet, MutSigCV, OncodriveFM, PNC, and SCS in identifying coding cancer drivers at the population level. The x-axis indicates the eight methods and the y-axis shows the F. The results are based on the cancer driver prediction for the five cancer types, including BRCA, LUAD, LUSC, KIRC, and HNSC, of the eight methods.
F of the eight methods in predicting drivers for the five cancer types
| No. | Method | BRCA | LUAD | LUSC | KIRC | HNSC |
|---|---|---|---|---|---|---|
| 1 | ActiveDriver | 0.062 | 0.035 | 0.046 | 0.054 | 0.080 |
| 2 | DawnRank | 0.045 | 0.043 | 0.040 | 0.040 | 0.043 |
| 3 | DriverML | 0.077 | 0.032 | 0.019 | 0.053 | 0.006 |
| 4 | DriverNet | NA | NA | 0.016 | 0.030 | NA |
| 5 | MutSigCV | 0.066 | 0.037 | 0.016 | 0.019 | 0.040 |
| 6 | OncodriveFM | 0.024 | 0.030 | 0.0101 | 0.016 | 0.046 |
| 7 | PNC | 0.178 | 0.174 | 0.182 | 0.188 | 0.115 |
| 8 | SCS | NA | 0.011 | 0.005 | 0.008 | NA |
Figure 4Overlap among the cancer drivers predicted by different methods. The charts illustrate the overlap among the cancer drivers at the population level predicted by the five methods (DriverML, ActiveDriver, DriverNet, MutSigCV, and OncodriveFM) w.r.t the five cancer types, including BRCA, LUAD, LUSC, KIRC, and HNSC. In each chart, the horizontal bars at the bottom left show the number of detected cancer drivers validated by the CGC, the vertical bars and the dotted lines show the overlap of the validated cancer drivers of the methods. If there is not an overlap, it will be a black dot.
GO biological processes involved in breast cancer in which the predicted cancer drivers are enriched
| Term | #Genes | p-value |
|---|---|---|
| GO:0045598 regulation of fat cell differentiation | 5 | 2.0e-03 |
| GO:0045596 negative regulation of cell differentiation | 6 | 3.6e-03 |
| GO:0045604 regulation of epidermal cell differentiation | 3 | 1.2e-02 |
| GO:0042127 regulation of cell proliferation | 10 | 2.5e-02 |
| GO:0045599 negative regulation of fat cell differentiation | 3 | 2.8e-02 |
| GO:0045580 regulation of T cell differentiation | 3 | 2.9e-02 |
| GO:2000736 regulation of stem cell differentiation | 4 | 3.1e-02 |
KEGG pathways involved in breast cancer in which the predicted cancer drivers are enriched
| Term | #Genes | p-value |
|---|---|---|
| ErbB signaling pathway | 6 | 5.3e-06 |
| Thyroid hormone signaling pathway | 6 | 2.8e-05 |
| Sphingolipid signaling pathway | 6 | 3.1e-05 |
| Neurotrophin signaling pathway | 6 | 3.0e-05 |
| PI3K-Akt signaling pathway | 8 | 1.7e-04 |
| AGE-RAGE signaling pathway in diabetic complications | 5 | 1.7e-04 |
| HIF-1 signaling pathway | 5 | 1.7e-04 |
| FoxO signaling pathway | 5 | 5.1e-04 |
| Fc epsilon RI signaling pathway | 4 | 5.2e-04 |
| Toll-like receptor signaling pathway | 4 | 2.2e-03 |
| TNF signaling pathway | 4 | 2.7e-03 |
| Relaxin signaling pathway | 4 | 4.6e-03 |
| VEGF signaling pathway | 3 | 5.1e-03 |
| Estrogen signaling pathway | 4 | 5.3e-03 |
| mTOR signaling pathway | 4 | 7.3e-03 |
| Prolactin signaling pathway | 3 | 7.4e-03 |
| B cell receptor signaling pathway | 3 | 7.6e-03 |
| p53 signaling pathway | 3 | 7.8e-03 |
| MAPK signaling pathway | 5 | 1.2e-02 |
| T cell receptor signaling pathway | 3 | 1.8e-02 |
| Rap1 signaling pathway | 4 | 1.8e-02 |
| C-type lectin receptor signaling pathway | 3 | 1.9e-02 |
| AMPK signaling pathway | 3 | 2.6e-02 |
| Apelin signaling pathway | 3 | 3.5e-02 |
| Insulin signaling pathway | 3 | 3.4e-02 |
| Phospholipase D signaling pathway | 3 | 4.1e-02 |
Figure 5Survival curves, clustering display, and silhouette plot. Survival curves are for cancer subtypes identified by using the four predicted cancer drivers, including AKT1, PTEN, CDKN1B, and TP53. The survival curves show the significant difference in the survivals of patients of the two subtypes (p-value = 0.0245). The clustering display indicates a highly qualified clustering with the similarity of samples in each subtype (i.e. Light dots show the similarity of samples). The silhouette plot has a large average silhouette width (0.76/1), indicating the clustering validity when using these four genes.