| Literature DB >> 28007024 |
Chengliang Dong1,2, Yunfei Guo1,2, Hui Yang1,3, Zeyu He4, Xiaoming Liu5,6, Kai Wang7,8.
Abstract
Cancer results from the acquisition of somatic driver mutations. Several computational tools can predict driver genes from population-scale genomic data, but tools for analyzing personal cancer genomes are underdeveloped. Here we developed iCAGES, a novel statistical framework that infers driver variants by integrating contributions from coding, non-coding, and structural variants, identifies driver genes by combining genomic information and prior biological knowledge, then generates prioritized drug treatment. Analysis on The Cancer Genome Atlas (TCGA) data showed that iCAGES predicts whether patients respond to drug treatment (P = 0.006 by Fisher's exact test) and long-term survival (P = 0.003 from Cox regression). iCAGES is available at http://icages.wglab.org .Entities:
Keywords: Cancer genomics; Machine learning; Precision medicine; Precision oncology; TCGA
Mesh:
Year: 2016 PMID: 28007024 PMCID: PMC5180414 DOI: 10.1186/s13073-016-0390-0
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Functionality comparison between iCAGES and other cancer driver gene detection tools. "V" represents "available"
| Category | Sub category | Sub Subcategory | Output | Tool | VCF input format | Single patient analysis | Structural variation analysis | Web server | Personalized drug | Graphical result | Prior knowledge integration | Non-coding Variant |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genomic variant analysis tools | Batch analysis tools | Driver point mutation prioritization | Protein-coding drivers | CHASM |
|
| ||||||
| Mutation Assessor |
|
| ||||||||||
| FATHMM |
|
| ||||||||||
| SIFT |
|
| ||||||||||
| PolyPhen-2 |
|
| ||||||||||
| GERP++ |
|
| ||||||||||
| VEST |
|
|
| |||||||||
| Non-coding drivers | FunSeq2 |
|
|
| ||||||||
| Driver gene prioritization | MuSiC |
|
| |||||||||
| MutSigCV |
|
|
| |||||||||
| Youn-Simon |
|
| ||||||||||
| Personal analysis tools | Driver gene prioritization |
|
|
|
|
|
|
|
|
| ||
| Phen-Gen |
|
|
|
| ||||||||
| OncoIMPACT |
|
| ||||||||||
| Transcriptomic expression analysis tools | PARADIGM-SHIFT |
| ||||||||||
| DawnRank |
| |||||||||||
| CONEXIC |
| |||||||||||
| TieDIE | ||||||||||||
| DriverNet |
| |||||||||||
| Memo |
|
| ||||||||||
| Dendrix |
| |||||||||||
| Phosphorylation analysis tools | ActiveDriver |
| ||||||||||
Fig. 1Analysis of each predictor selected for the radial SVM modeling for iCAGES variant score. a Correlation diagrams illustrating the pairwise Pearson correlation between all predictors and outcome variable in the training dataset. The color and size of the shaded region in the pie charts at the upper right indicate the level of correlation, with red and larger proportions of the shaded region indicating higher positive correlation. b Violin plots of scores from different predictors (different colors) in the training dataset in the TP (deleterious) and TN (neutral) groups. Each plot shows the median (indicated by the small white dot), the first through the third interquartile range (the thick, solid vertical band), and the density (different colors) of the predictor scores in each group
Major output files of iCAGES
| Category | Column name | Description | Example |
|---|---|---|---|
| Mutation prioritization output | Gene name | HUGO name of the gene | ARAF |
| Chromosome number | Chromosome number of this mutation | 1 | |
| Coordinate | Genomic coordinate of the mutation | 1234560 | |
| Reference allele | Reference allele of this mutation | C | |
| Alternative allele | Alternative allele of this mutation | G | |
| Mutation category | Whether this mutation is a point coding mutation, point non-coding mutation, or structural variation | Point coding mutation | |
| Mutation context | Genomic context of the mutation | c.641C > G | |
| Protein context | Protein context of the mutation | p.S214C | |
| Score category | If the mutation is a point coding mutation, then the category is the radial SVM score; if a point non-coding mutation, then Funseq2 score; if structural variation, then normalized CNV signal score | Radial SVM | |
| Driver mutation score | Value of the corresponding score | 0.932 | |
| Gene prioritization output | Gene name | HUGO name of the gene | ARAF |
| Gene category | Whether this gene belongs to the Cancer Gene Census, KEGG Cancer Pathway, or other categories | KEGG cancer pathway | |
| Maximum radial SVM score | Maximum radial SVM score of all point coding mutations in the gene | 0.932 | |
| Maximum FunSeq2 Score | Maximum FunSeq2 score of all point non-coding mutations in the gene | 0.000 | |
| Maximum normalized CNV Signal score | Maximum normalized CNV signal score of all structural variations in the gene | 0.000 | |
| Phenolyzer score | Phenolyzer score of the gene | 0.306 | |
| iCAGES gene score | iCAGES gene score of the gene | 0.484 | |
| Drug prioritization output | Drug name | Name of the drug | SORAFENIB |
| Final target gene | Mutated gene in the patient finally targeted by the drug | ARAF | |
| Direct target gene | Mutated gene in the patient directly targeted by the drug | ARAF | |
| iCAGES gene score | iCAGES gene score of the target gene | 0.484 | |
| BioSystems normalized Relatedness probability | BioSystems normalized relatedness probability between the direct target of the drug and the target gene | 1.000 | |
| PubChem normalized drug active probability | PubChem normalized drug active probability of this drug | 1.000 | |
| iCAGES drug score | iCAGES drug score of the drug | 0.484 | |
| Tier | Which tier this drug belongs to, whether it is FDA approved (tier 1), undergoing clinical trials (tier 2) or otherwise (tier 3). | 1 | |
| Brand name | Commercial brand name of this drug | NEXAVAR | |
| FDA approved subtype | What cancer subtypes approved by FDA can this drug be applied to | Hepatocellular carcinoma, renal cancer, thyroid cancer | |
| Clinical trial name | The name of the active clinical trials on this drug | Sorafenib phase II study for Japanese anaplastic or medullary thyroid carcinoma patients | |
| Clinical trial organization | The organization for this clinical trial | BAYER | |
| Clinical trial phase | Phase of this clinical trial | 2 | |
| Clinical trial URL | URL of this clinical trial |
|
Fig. 2The iCAGES package as three layers. The input file contains all variants identified from the patient; it can be either in ANNOVAR input format or in VCF format. The first layer of iCAGES prioritizes mutations. It computes three different feature scores for annotating the gene, including the radial SVM score for each of its point coding mutations, CNV normalized peak score for each of its structural variations, and FunSeq2 score for each of its point non-coding mutations. The second layer of iCAGES prioritizes cancer driver genes. It takes three feature scores from the first layer, generates the corresponding Phenolyzer score for each mutated gene and computes a LR score for this gene (iCAGES gene score). The final level of iCAGES prioritizes targeted drugs. It first queries the DGIdb and FDA drug database for potential drugs that interact with mutated genes and their neighbors. Next, it calculates the joint probability for each drug being the most effective (iCAGES drug score) from three feature scores, which are iCAGES gene scores for its direct/indirect target, normalized BioSystems probability measuring the maximum relatedness of a drug’s direct target with each mutated gene (final target), and PubChem active probability measuring the bioactivity of the drug. The final output of iCAGES consists of three major elements, a prioritized list of mutations, a prioritized list of genes with their iCAGES gene scores, as well as a prioritized list of targeted drugs with their iCAGES drug scores
Fig. 3Performance of the first layer of iCAGES. a Performance of the radial SVM score evaluated on the COSMIC version 68 testing dataset (testing dataset I). A higher AUC score indicates better performance. The 95% CI was computed with 2000 stratified bootstrap replicates. b Performance of the radial SVM score evaluated on Cancer Gene Census genes from COSMIC version 68 testing dataset (testing dataset II)
Fig. 4Performance of the second layer of iCAGES. a Performance of the iCAGES score compared to MutSigCV, evaluated on 14,169 TCGA patients. A higher AUC score indicates better performance. The 95% CI was computed with 2000 stratified bootstrap replicates (testing dataset I). b Performance of iCAGES compared to IntOgen, evaluated on data from 6748 patients used in the Rubio-Perez et al. study. Each bar represents the number of patients whose cancer driver gene can be identified by iCAGES or by IntOgen. Top One, Top Five, Top Ten and Top Twenty refer to using the top gene, top five genes, top ten genes, and top 20 genes from prediction, respectively. A significant advantage of iCAGES compared to other tools is indicated with ***P ≤ 0.0001 (Bonferroni correction; testing dataset II). c Performance of iCAGES compared to IntOgen, Phen-Gen, and MuSiC evaluated on data from 3178 patients used in the Kandoth et al. study (testing dataset III). d Performance of iCAGES compared to IntOgen, Phen-Gen, and MuSiC evaluated on data from 71 patients used in the Kandoth et al. study but not in the Rubio-Perez et al. study (testing dataset IV)
Fig. 5Performance of the third layer of iCAGES. a–c Kaplan–Meier survival curve for 124 TCGA patients with targeted therapy with unknown response whose data were also used in the Rubio-Perez et al. study (testing dataset I). a Red and blue curves represent patients whose treatments do and do not contain iCAGES-predicted first tier drugs, respectively. Red and blue areas represent the 95% confidence interval for the survival curve. b Red and blue curves represent patients whose treatments do and do not contain Rubio-Perez et al.-predicted drugs, respectively. c Red and blue curves represent patients whose treatments do and do not contain DGIdb-predicted drugs. d Number of TCGA patients with targeted therapy with complete response or progressive disease who received correct iCAGES-predicted drugs (blue), DGIdb drugs (gray), Rubio-Perez et al. tier one drugs (orange). e Number of patients used in Rubio-Perez et al. study who can potentially benefit from iCAGES (without pathway component from BioSystem) predicted drugs from three tiers (blue), iCAGES-predicted drugs (green), Rubio-Perez et al.-predicted drugs (orange). Significant advantage of iCAGES compared to other tools is indicated as ***P ≤ 0.0001 and Bonferroni correction (testing dataset III)
Fig. 6The web interface of iCAGES, as demonstrated using data from Imielinski et al. a The submission page for iCAGES. Users can enter data with the VCF format (default) or with ANNOVAR input format used in the ANNOVAR package. b Dynamic form for advanced users. Users can click “Advanced Options” and enter additional information, such as structural variations in BED format, cancer subtype, and drugs that this patient has been using. c Bubble plot output of the iCAGES package. The size of the bubbles indicates the weight of the iCAGES score and different colors indicate the category of the gene. Red, blue, and green indicate that this gene belongs to the Cancer Gene Census, the KEGG cancer pathway, or neither category, respectively. Pink bubbles that are connected to blue, green or red bubbles indicate targeted drugs. d The corresponding bar plot of the output. The length of the bar indicates the weight of the iCAGES score and different colors indicate the category of the gene