| Literature DB >> 23166609 |
Ivan P Gorlov1, Christopher J Logothetis, Shenying Fang, Olga Y Gorlova, Christopher Amos.
Abstract
More than 400 cancer genes have been identified in the human genome. The list is not yet complete. Statistical models predicting cancer genes may help with identification of novel cancer gene candidates. We used known prostate cancer (PCa) genes (identified through KnowledgeNet) as a training set to build a binary logistic regression model identifying PCa genes. Internal and external validation of the model was conducted using a validation set (also from KnowledgeNet), permutations, and external data on genes with recurrent prostate tumor mutations. We evaluated a set of 33 gene characteristics as predictors. Sixteen of the original 33 predictors were significant in the model. We found that a typical PCa gene is a prostate-specific transcription factor, kinase, or phosphatase with high interindividual variance of the expression level in adjacent normal prostate tissue and differential expression between normal prostate tissue and primary tumor. PCa genes are likely to have an antiapoptotic effect and to play a role in cell proliferation, angiogenesis, and cell adhesion. Their proteins are likely to be ubiquitinated or sumoylated but not acetylated. A number of novel PCa candidates have been proposed. Functional annotations of novel candidates identified antiapoptosis, regulation of cell proliferation, positive regulation of kinase activity, positive regulation of transferase activity, angiogenesis, positive regulation of cell division, and cell adhesion as top functions. We provide the list of the top 200 predicted PCa genes, which can be used as candidates for experimental validation. The model may be modified to predict genes for other cancer sites.Entities:
Mesh:
Year: 2012 PMID: 23166609 PMCID: PMC3499550 DOI: 10.1371/journal.pone.0049175
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Variables used to build a binary logistic model to discriminate known PCa genes.
| Type of variable | Variable | Source of the data |
| Specific | Three-level meta-analysis | Ref. |
| Nonspecific | Acetylated | GO |
| Nonspecific | Angiogenesis | GO |
| Nonspecific | Antiapoptotic | GO |
| Nonspecific | Cell adhesion | GO |
| Nonspecific | Cell proliferation | GO |
| Nonspecific | Chromatin remodeling | GO |
| Specific | Difference in expression –LOG(P) | Refs. |
| Nonspecific | DNA repair | GO |
| Nonspecific | DNA replication | GO |
| Nonspecific | Evolutionary conservation index | HomoloGene |
| Nonspecific | Expression level in normal prostate | Ref. |
| Nonspecific | Extracellular space | GO |
| Nonspecific | Growth factors | GO |
| Nonspecific | Housekeeping gene | Ref. |
| Nonspecific | Kinases | GO |
| Specific | Mean expression in adjacent tissue | Refs. |
| Specific | Mean expression in tumor tissue | Refs. |
| Specific | Meta-analysis of the gene expression | Ref. |
| Nonspecific | Methylated | GO |
| Nonspecific | Phosphatases | GO |
| Nonspecific | Phosphorylated | GO |
| Nonspecific | Plasma membrane | GO |
| Specific | Prostate-specific expression (enrichment score) | Ref. |
| Nonspecific | Secreted | GO |
| Nonspecific | Signal transduction | GO |
| Nonspecific | Sumoylated | GO |
| Nonspecific | Transcription | GO |
| Nonspecific | Transcription factors | GO |
| Nonspecific | Translation | GO |
| Nonspecific | Ubiquitinated | GO |
| Specific | Variance in adjacent tissue | Refs. |
| Specific | Variance in tumor tissue | Refs. |
GO, Gene Ontology database [27], [28].
HomoloGene Database: http://www.ncbi.nlm.nih.gov/homologene.
Variables significant in the multivariable binary logistic regression model with putative PCa genes excluded.
| Variable | B | SE | χ2 | df |
|
| Prostate-specific expression(enrichment score) | 0.313 | 0.039 | 66.116 | 1 | <0.001 |
| Kinases | 1.929 | 0.333 | 33.647 | 1 | <0.001 |
| Variance in adjacent tissue | 0.68 | 0.131 | 27.097 | 1 | <0.001 |
| Phosphatases | 2.486 | 0.483 | 26.469 | 1 | <0.001 |
| Growth factors | 1.818 | 0.453 | 16.132 | 1 | <0.001 |
| Meta-analysis of thegene expression | 0.143 | 0.037 | 15.226 | 1 | <0.001 |
| Transcription factors | 1.201 | 0.326 | 13.562 | 1 | <0.001 |
| Antiapoptotic | 1.497 | 0.415 | 13.043 | 1 | <0.001 |
| Extracellular space | 0.91 | 0.303 | 9.05 | 1 | 0.003 |
| Signal transduction | 0.781 | 0.272 | 8.269 | 1 | 0.004 |
| Cell proliferation | 1.131 | 0.396 | 8.154 | 1 | 0.004 |
| Ubiquitinated | 0.574 | 0.244 | 5.542 | 1 | 0.019 |
| Angiogenesis | 1.062 | 0.461 | 5.32 | 1 | 0.021 |
| Acetylated | −0.577 | 0.251 | 5.276 | 1 | 0.022 |
| Cell adhesion | 0.804 | 0.386 | 4.342 | 1 | 0.037 |
| Sumoylated | 0.937 | 0.466 | 4.043 | 1 | 0.044 |
B, regression coefficient; SE, standard error.
Figure 1Variables that discriminate genes with recurrent somatic mutations in prostate tumors from all other genes.
Vertical line represents a threshold for statistical significance.