| Literature DB >> 18710882 |
Simon J Furney1, Borja Calvo, Pedro Larrañaga, Jose A Lozano, Nuria Lopez-Bigas.
Abstract
The development of techniques for oncogenomic analyses such as array comparative genomic hybridization, messenger RNA expression arrays and mutational screens have come to the fore in modern cancer research. Studies utilizing these techniques are able to highlight panels of genes that are altered in cancer. However, these candidate cancer genes must then be scrutinized to reveal whether they contribute to oncogenesis or are coincidental and non-causative. We present a computational method for the prioritization of candidate (i) proto-oncogenes and (ii) tumour suppressor genes from oncogenomic experiments. We constructed computational classifiers using different combinations of sequence and functional data including sequence conservation, protein domains and interactions, and regulatory data. We found that these classifiers are able to distinguish between known cancer genes and other human genes. Furthermore, the classifiers also discriminate candidate cancer genes from a recent mutational screen from other human genes. We provide a web-based facility through which cancer biologists may access our results and we propose computational cancer gene classification as a useful method of prioritizing candidate cancer genes identified in oncogenomic studies.Entities:
Mesh:
Year: 2008 PMID: 18710882 PMCID: PMC2566894 DOI: 10.1093/nar/gkn482
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The information types in the datasets used to construct the dominant cancer gene classifiers (Onc-C) and recessive cancer gene classifiers (TSG-C) and the number of genes in each dataset
| Dataset | Information | Total | CD | CR |
|---|---|---|---|---|
| PC–GS | Protein conservation (PC), Gene structure (GS) | 22 125 | 272 | 66 |
| PC–GS–PD | PC, GS and Protein domains (PD) | 16 300 | 259 | 62 |
| PC–GS–PI | PC, GS and Protein interaction (PI) | 14 847 | 266 | 66 |
| PC–GS–RD | PC, GS and Regulatory data (RD) | 13 928 | 238 | 58 |
| PC–GS–PD–PI–RD | PC, GS, PD, PI and RD | 11 560 | 226 | 54 |
Figure 1.Schematic representation of the process to construct the classifiers and obtain the probability ranks. (a) The set genes labelled as CD are used as the positive set for the Onc-C and the rest as unlabelled. For the TSG-C, the set of CR genes are used as the positive set and the rest as unlabelled. (b) Next, the Averaged Positive Naïve Bayes method is applied to these sets with the corresponding property sets (PC–GS, PC–GS–PI, etc.). (c) The classifiers are obtained and applied to all genes (d) in order to obtain a probability rank for each human gene (e).
Mean ranked probabilities obtained by the dominant cancer gene classifier (Onc-C) and the recessive cancer gene classifier (TSG-C) for different sets of genes generated using different property sets
| Onc-C | TSG-C | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unlabelled | CR | DD | DR | BC | CC | NonCAN | Unlabelled | CD | DD | DR | BC | CC | NonCAN | |||
| PC–GS | 0.50 | 0.71 | 0.63 | 0.63 | 0.78 | 0.78 | 0.69 | 0.50 | 0.61 | 0.56 | 0.62 | 0.75 | 0.74 | 0.64 | ||
| PC–GS–PD | 0.50 | 0.65 | 0.57 | 0.55 | 0.73 | 0.73 | 0.63 | 0.50 | 0.59 | 0.53 | 0.62 | 0.76 | 0.74 | 0.63 | ||
| PC–GS–PI | 0.49 | 0.78 | 0.60 | 0.55 | 0.73 | 0.74 | 0.63 | 0.50 | 0.67 | 0.58 | 0.59 | 0.76 | 0.75 | 0.64 | ||
| PC–GS–RD | 0.50 | 0.58 | 0.58 | 0.46 | 0.69 | 0.76 | 0.61 | 0.50 | 0.59 | 0.52 | 0.58 | 0.75 | 0.73 | 0.62 | ||
| PC–GS–PD–PI–RD | 0.49 | 0.71 | 0.59 | 0.46 | 0.70 | 0.74 | 0.61 | 0.50 | 0.66 | 0.55 | 0.56 | 0.73 | 0.71 | 0.62 | ||
CD = cancer dominant, CR = cancer recessive, DD = disease dominant, DR = disease recessive, BC = breast cancer candidate genes, CC = colon cancer candidate genes, NonCAN = genes with mutations that are not candidate cancer genes.
Cancer dominant and cancer recessive coloumns are set in bold.
Figure 2.Distribution of ranked probabilities for Cancer Gene Census dominant and recessive cancer genes and unlabelled genes (the rest of human genes) using different datasets for proto-oncogenes (left) and tumour suppressor gene (right) classifiers.
Mean ranked probabilities and standard deviation for one hundred positive and unlabelled randomly selected datasets using the variables from the dominant cancer gene classifier (Onc-C) and the recessive cancer gene classifier (TSG-C). Means of the cancer dominant and cancer recessive sets ranked probabilities from the random classifiers are included
| Positive mean | Positive | Unlabelled mean | Unlabelled | CD mean | CR mean | |
|---|---|---|---|---|---|---|
| Onc-C | ||||||
| PC–GS | 0.49 | 0.03 | 0.50 | 0.00 | 0.51 | 0.52 |
| PC–GS–PD | 0.49 | 0.03 | 0.50 | 0.00 | 0.55 | 0.48 |
| PC–GS–PI | 0.50 | 0.02 | 0.50 | 0.00 | 0.50 | 0.51 |
| PC–GS–RD | 0.49 | 0.03 | 0.50 | 0.00 | 0.48 | 0.50 |
| PC–GS–PD–PI–RD | 0.50 | 0.03 | 0.50 | 0.00 | 0.56 | 0.53 |
| TSG-C | ||||||
| PC–GS | 0.48 | 0.06 | 0.50 | 0.00 | 0.50 | 0.50 |
| PC–GS–PD | 0.47 | 0.06 | 0.50 | 0.00 | 0.51 | 0.57 |
| PC–GS–PI | 0.48 | 0.06 | 0.50 | 0.00 | 0.53 | 0.55 |
| PC–GS–RD | 0.48 | 0.06 | 0.50 | 0.00 | 0.52 | 0.51 |
| PC–GS–PD–PI–RD | 0.49 | 0.05 | 0.50 | 0.00 | 0.54 | 0.61 |