| Literature DB >> 25161246 |
Michael P Schroeder1, Carlota Rubio-Perez1, David Tamborero1, Abel Gonzalez-Perez1, Nuria Lopez-Bigas2.
Abstract
MOTIVATION: Several computational methods have been developed to identify cancer drivers genes-genes responsible for cancer development upon specific alterations. These alterations can cause the loss of function (LoF) of the gene product, for instance, in tumor suppressors, or increase or change its activity or function, if it is an oncogene. Distinguishing between these two classes is important to understand tumorigenesis in patients and has implications for therapy decision making. Here, we assess the capacity of multiple gene features related to the pattern of genomic alterations across tumors to distinguish between activating and LoF cancer genes, and we present an automated approach to aid the classification of novel cancer drivers according to their role. RESULT: OncodriveROLE is a machine learning-based approach that classifies driver genes according to their role, using several properties related to the pattern of alterations across tumors. The method shows an accuracy of 0.93 and Matthew's correlation coefficient of 0.84 classifying genes in the Cancer Gene Census. The OncodriveROLE classifier, its results when applied to two lists of predicted cancer drivers and TCGA-derived mutation and copy number features used by the classifier are available at http://bg.upf.edu/oncodrive-role.Entities:
Mesh:
Year: 2014 PMID: 25161246 PMCID: PMC4147920 DOI: 10.1093/bioinformatics/btu467
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
List of mutational and CNA features for cancer driver genes
| Attribute name | Description |
|---|---|
| CNA_cbs_countGain | # samples in cohort with CBS value > 1.1 |
| CNA_cbs_countLoss | # samples in cohort with CBS value < 1.1 |
| CNA_cbs_logratio_GvL | Log10-ratio of countGain VS countLoss |
| CNA_gain_freq | # samples in cohort with CBS value > 1.1 / cohort size |
| CNA_loss_freq | # samples in cohort with CBS value < 1.1 / cohort size |
| MUTS_clusters_miss_VS_pam | Log10-ratio of missense VS PAM within OncodriveCLUST peaks |
| MUTS_freq_clustered | # of mutations in OncodriveCLUST peaks / # of samples with gene mutated |
| MUTS_freq_disruptive | # of samples with truncating mutations or high impact missense / # of samples having gene mutations |
| MUTS_freq_missH | # of high impact missense mutations not in OncodriveCLUST peaks / # samples with gene mutated |
| MUTS_freq_missHM | # of high and medium impact missense mutations not in OncodriveCLUST peaks / # samples with gene mutated |
| MUTS_freq_truncating | # of samples with truncating mutations / # of samples with at least one mutation |
| MUTS_missense_clustercov | # missense mutations in OncodriveCLUST peaks / # missense mutations / # amino acids covered by peaks |
| MUTS_missense_mutrec | # recurrent missense mutations / # high and medium impact missense mutations |
| MUTS_missense_rec_freq | # recurrent missense mutations / # mutations (as in Vogelstein |
| MUTS_missense_recHM | # samples with high and medium impact recurrent missense mutations / # samples with missense mutations |
| MUTS_OncoFM_pvalue | OncodriveFM |
| MUTS_pams_count | # samples with PAM |
| MUTS_pams_freq | # samples with PAM / # samples with gene mutations |
| MUTS_pams_ratio | # samples with PAM VS # samples with no PAM |
| MUTS_pamsrec_freq | # samples with PAM VS # of samples with gene mutation |
| MUTS_trunc_count | # samples with truncating mutations |
| MUTS_trunc_freq_cohort | # of truncating mutations / # of samples with gene mutations |
| MUTS_trunc_mutfreq | # truncating mutations / # mutations (as in Vogelstein |
| MUTS_trunc_vs_missbenign_ratio | # samples with truncating mutations VS # samples with benign missense mutations |
| MUTS_trunc_vs_missense_ratio | # samples with truncating mutations VS # samples with missense mutations |
| MUTS_trunc_vs_notrunc_ratio | # samples with truncating mutations VS # samples without truncating mutations |
| MUTS_tuson_missHM_missbenign_ratio | # samples with high and medium impact mutations VS # samples with benign missense mutations (as described in Davoli |
| MUTS_tuson_splicing_missbenign_ratio | # samples splicing variants mutations VS # samples with benign missense mutations (as described in Davoli |
| MUTS_tuson_trunc_missbenign_ratio | # samples with truncating (excluding splicing variants) mutations VS # samples with benign missense mutations (as described in Davoli |
Note: List of features initially created for characterizing LoF and Act genes. The description reflects the formula applied for the calculation of the features. All features elaborated describe either mutation or CNA characteristics. Abbreviations used in the descriptions are: # (number sign): Count/number of, / (slash): divided by, CBS : circular binary segmentation, truncating mutations: frameshift, stop gained and lost, splice donor and acceptor, missense: all missense mutations and insertions and deletions not altering the reading frame, high and medium impact mutations: all missense mutations with and TransFIC impact of 1 and 2 , benign missense: all missense with low or unknown TransFIC impact, PAM : protein affecting: frameshift, stop gained and lost, splice donor and acceptor, missense, (gene) mutations: all mutations-affecting coding sequence, VS : versus—a ratio has been obtained.
Fig. 1.A) The list of features ordered by Mann–Whitney–Wilcoxon rank sum test P-value significance. Features dependant on truncating mutations are the best discriminators for LoF and Act genes. Features described in (B) are marked with asterisk. A detailed explanation of each feature can be found in Table 1. (B) Box plots comparing the distribution of the three non-redundant top-ranking features that have been selected for the OncodriveROLE classifier in CGC genes annotated as Dom and Rec
Fig. 2.Classification of 200 (HCD list) and 144 (Cancer5000 list) cancer driver genes into the classes Act and LoF. The training set of OncodriveROLE constitutes of all ‘Dom’ and ‘Rec’ labeled data points. ‘Dom?’ are CGC-annotated dominant genes excluded from the training set because of strong resemblance to the ‘Rec’ genes and previous literature evidence of this role. ‘DomT’ genes are CGC-annotated dominant genes only citing translocation events as prove and therefore not included in the training set. All ‘-’ labeled data points are driver genes not annotated in CGC, and whose prediction was the main goal of the study. The thresholds are drawn at 0.3 (as top limit of the LoF class) and 0.7 (as bottom limit of the Act class). Working with classification score thresholds of 0.3 (as top limit of the LoF class) and 0.7 (as bottom limit of the Act class), we classified 109 genes as LoF, 76 as Activating and left 15 genes as unclassified in the HCD list; meanwhile, we classified 97 genes as LoF, 43 as Activating and left 4 genes as unclassified (Fig. 2) in the Cancer5000 list. Genes for which we have observed <12 mutations were directly classified as ‘No class’ and assigned NA values in the classifications results (see Supplementary Tables S4 and S6)
List of approaches and their performance on trimmed CGC dataset
| Method | ACC | MCC | COV (%) |
|---|---|---|---|
| OncodriveROLE | 0.925 | 0.848 | 83 |
| 20-20 rule | 0.895 | 0.769 | 75 |
| Tuson | 0.914 | 0.817 | 92 |
aResults of leave-one-out cross-validation.
List of approaches and their performance on the 290 drivers from the HCD list and 260 drivers from the Cancer5000 list
| Method | Act/Oncogene | LoF/Tumour suppressor | Unclassified | Coverage (%) |
|---|---|---|---|---|
| HCD | ||||
| Oncodrive ROLE 0.3/0.7 | 76 | 109 | 15 | 92 |
| Oncodrive ROLE 0.2/0.8 | 58 | 96 | 46 | 77 |
| 20-20 rule | 23 | 96 | 81 | 60 |
| Tuson | 44 | 92 | 64 | 68 |
| Cancer5000 | ||||
| Oncodrive ROLE 0.3/0.7 | 43 | 97 | 4 | 97 |
| Oncodrive ROLE 0.2/0.8 | 40 | 91 | 13 | 91 |
| 20-20 rule | 18 | 90 | 36 | 75 |
| Tuson | 32 | 90 | 22 | 85 |