| Literature DB >> 27863091 |
Christos M Dimitrakopoulos1,2, Niko Beerenwinkel1,2.
Abstract
High-throughput DNA sequencing techniques enable large-scale measurement of somatic mutations in tumors. Cancer genomics research aims at identifying all cancer-related genes and solid interpretation of their contribution to cancer initiation and development. However, this venture is characterized by various challenges, such as the high number of neutral passenger mutations and the complexity of the biological networks affected by driver mutations. Based on biological pathway and network information, sophisticated computational methods have been developed to facilitate the detection of cancer driver mutations and pathways. They can be categorized into (1) methods using known pathways from public databases, (2) network-based methods, and (3) methods learning cancer pathways de novo. Methods in the first two categories use and integrate different types of data, such as biological pathways, protein interaction networks, and gene expression measurements. The third category consists of de novo methods that detect combinatorial patterns of somatic mutations across tumor samples, such as mutual exclusivity and co-occurrence. In this review, we discuss recent advances, current limitations, and future challenges of these approaches for detecting cancer genes and pathways. We also discuss the most important current resources of cancer-related genes. WIREs Syst Biol Med 2017, 9:e1364. doi: 10.1002/wsbm.1364 For further resources related to this article, please visit the WIREs website.Entities:
Mesh:
Year: 2016 PMID: 27863091 PMCID: PMC5215607 DOI: 10.1002/wsbm.1364
Source DB: PubMed Journal: Wiley Interdiscip Rev Syst Biol Med ISSN: 1939-005X
Figure 1Detecting novel cancer genes begins with the sequencing of tumor samples (either whole exome or whole genome). The first step is to detect somatic mutations (single‐nucleotide variants, indels, CNVs, structural variants, and gene fusions) from sequencing data using variant callers. The list of variant calls needs to be filtered to remove neutral passenger mutations and to detect candidate cancer driver genes. The simplest ways to perform such filtering is by detecting recurrent variants and by predicting the functional impact of each mutation. Then, methods that are more sophisticated come into play. They can broadly be categorized into three types: (1) methods that use preexisting pathways, (2) methods that are based on existing biological network data, and (3) methods predicting cancer pathways de novo based on their combinatorial patterns of occurrence in a group of tumors. Finally, the discoveries are validated experimentally.
Summary of Methods for Predicting and Interpreting Cancer Genes
|
|
|
|
|---|---|---|
| Known pathways | DAVID, | These methods use a statistical test to assess the significance of the overlap between a gene set and a known pathway. |
| GSEA | Tests if the expression levels of the genes in the gene set correlate with a specific phenotype. | |
| Grossman et al. | Gene Ontology (GO) enrichment analysis by taking into account the GO tree hierarchy. Determines overrepresentation of terms in the context of annotations to the term's parents. | |
| PathScan | Tests if the mutations of different cancer patients exhibit enrichments in the same pathways. | |
| Network‐based | NetBox | Detects network modules in a given list of input genes and accesses the statistical significance of their modularity. |
| DriverNet | Identifies driver mutations by their effect on mRNA expression networks. | |
| Torkamani and Schork | Identifies functionally related gene modules targeted by somatic mutations by reconstructing regulatory interactions. | |
| NBS | Uses network diffusion to stratify patients based on the observation that their aberrations lie in similar network regions. | |
| HotNet2 | Uses insulated network diffusion to detect mutated subnetworks with statistically significant size. Captures the directionality of interactions. | |
| TieDIE | Uses network diffusion to link somatic mutations to transcriptional changes. | |
|
| RME | Detects gene modules whose members are recurrently mutated and exhibit mutually exclusive patterns. |
| Dendrix | Detects driver pathways characterized by high exclusivity and high sample coverage. Requires high coverage of the discovered gene modules instead of each gene separately. | |
| Multi‐Dendrix | Identifies multiple mutually exclusive sets of genes in parallel. | |
| CoMEt | Identifies multiple sets of mutually exclusive genetic alterations from different subtypes of the same cancer type. | |
| TiMEx | Models the interplay between the waiting times to alterations and the observation time. Highly sensitive to mutually exclusive occurring low‐frequency driver alterations. | |
| pathTiMEx | Takes into account the evolutionary order constraints among pathways to detect mutually exclusive cancer alterations. | |
| Sakoparnig et al. | Identifies low‐frequency genomic alterations based on mutational dependencies. | |
| muex | Models the generative process of mutually exclusive patterns in the presence of noise. | |
| Combined | MEMo | Detects network cliques of aberrant genes with mutually exclusive patterns. |
| Mutex | Identifies mutually exclusive groups of genes with a common effect on a given signaling network. | |
| MEMCover | Detects mutually exclusive groups of mutated genes in the same or across different tissues. |
Figure 2Methods that are based on known pathways to identify cancer drivers. (a) Most methods statistically assess the significance of the overlap between a user‐defined gene set (red‐colored nodes) and a known pathway (blue‐colored nodes).23, 24, 25 The user‐defined gene set is usually the result of an experiment (e.g., a list of genetically aberrant genes). Dark red‐colored nodes correspond to the genes that belong both to the known pathway and to the user‐defined gene set. (b) Other methods compute per‐patient enrichment scores28 for a known pathway in the same way, which are then combined into an overall score.
Figure 3Network‐based methods to identify cancer genes. (a) Network‐based methods like HotNet233 detect cancer driver genes as strongly connected components of aberrant genes in the network. Network diffusion is used to estimate how strongly the aberrant genes are connected in the network. Nodes are initialized with an aberration score that corresponds to the proportion of samples that contain a single‐nucleotide variant or copy number aberration in the gene. Using network diffusion the aberration scores are spread in the network until an equilibrium state is reached, where there are no more significant changes in the scores during time. Nodes correspond to genes and edges to gene interactions. Colors correspond to the amount of aberration score that is concentrated at the node before and after network diffusion (red: high, orange: medium, yellow: low). Gray nodes correspond to very low or zero aberration score. (b) TieDIE34 also uses network diffusion to capture the genes in the network that link genetically aberrant genes to differentially expressed genes at the transcription level, the so‐called linker genes. Blue colored nodes correspond to differentially expressed genes. The linker genes are represented with purple color and during network diffusion, they receive flowing aberration scores from both genetically aberrant and differentially expressed genes.
Figure 4Methods that identify cancer genes de novo by detecting combinatorial patterns of cancer mutations across patients. Red squares correspond to genetically aberrant genes. The rows of the depicted matrix correspond to different patients and the columns to different genes. (a) Mutual exclusivity and co‐occurrence, as depicted in the figure, are two combinatorial patterns of mutations and the statistical significance of which can aid in detecting cancer genes de novo. Methods attempt to detect these patterns in the presence of noise; hence, the patterns detected are not perfectly co‐occurring or mutually exclusive. (b) pathTiMEx40 predicts cancer progression at the level of pathways by introducing a probabilistic waiting time model for mutually exclusive cancer alterations. It explicitly accounts for the evolutionary order constraints among pathways, which would otherwise confound the detection of mutually exclusive gene groups (directed arrows).
Description of the Output of Each Method
|
|
|
|
|
|---|---|---|---|
| NetBox | √ | ||
| DriverNet | √ | ||
| Torkamani and Schork | √ | √ | |
| NBS | √ | ||
| HotNet2 | √ | ||
| TieDIE | √ | ||
| RME | √ | ||
| Dendrix | √ | ||
| Multi‐Dendrix | √ | ||
| CoMEt | √ | ||
| TiMEx | √ | ||
| pathTiMEx | √ | ||
| Sakoparnig et al. | √ | ||
| muex | √ | ||
| MEMo | √ | √ | |
| Mutex | √ | √ | |
| MEMCover | √ | √ |
Only network‐based methods and methods based on combinatorial patterns of mutations are included.