| Literature DB >> 25949096 |
Sigrid Rouam1, Lance D Miller2, R Krishna Murthy Karuturi3.
Abstract
Driver genes are directly responsible for oncogenesis and identifying them is essential in order to fully understand the mechanisms of cancer. However, it is difficult to delineate them from the larger pool of genes that are deregulated in cancer (ie, passenger genes). In order to address this problem, we developed an approach called TRIAngulating Gene Expression (TRIAGE through clinico-genomic intersects). Here, we present a refinement of this approach incorporating a new scoring methodology to identify putative driver genes that are deregulated in cancer. TRIAGE triangulates - or integrates - three levels of information: gene expression, gene location, and patient survival. First, TRIAGE identifies regions of deregulated expression (ie, expression footprints) by deriving a newly established measure called the Local Singular Value Decomposition (LSVD) score for each locus. Driver genes are then distinguished from passenger genes using dual survival analyses. Incorporating measurements of gene expression and weighting them according to the LSVD weight of each tumor, these analyses are performed using the genes located in significant expression footprints. Here, we first use simulated data to characterize the newly established LSVD score. We then present the results of our application of this refined version of TRIAGE to gene expression data from five cancer types. This refined version of TRIAGE not only allowed us to identify known prominent driver genes, such as MMP1, IL8, and COL1A2, but it also led us to identify several novel ones. These results illustrate that TRIAGE complements existing tools, allows for the identification of genes that drive cancer and could perhaps elucidate potential future targets of novel anticancer therapeutics.Entities:
Keywords: cancer; data mining; driver genes; gene expression; survival
Year: 2015 PMID: 25949096 PMCID: PMC4354331 DOI: 10.4137/CIN.S18302
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 6Overview of the TRIAGE methodology.
Parameter settings used in the six simulated configurations.
| CONFIGURATION | VALUE OF THE PARAMETERS | ||||||
|---|---|---|---|---|---|---|---|
| % OF TUMORS | NUMBER OF GENES IN THE FOOT PRINT | MEAN VALUE OF GENE EXPRESSION | WINDOW SIZE (W) | THRESHOLD (A) | CORRELATION | ||
| INDUCED | REPRESSED | ||||||
| (i) | 5 to 80% | 20 | 3 | −3 | 5 | 1.5 | 0 |
| (ii) | 20% | 5 to 100 | 3 | −3 | 5 | 1.5 | 0 |
| (iii) | 20% | 20 | 1.5 to 4 | −4 to −1.5 | 5 | 1.5 | 0 |
| (iv) | 20% | 20 | 3 | −3 | 5 to 20 | 1.5 | 0 |
| (v) | 20% | 20 | 3 | −3 | 5 | 1 to 2 | 0 |
| (vi) | 20% | 20 | 3 | −3 | 5 | 1.5 | 0 to 0.75 |
Figure 1Graph (A) and boxplot (B) of the LSVD scores for 1,000 overexpressed, simulated genes contributing to the expression footprint for varying percentages of tumor samples (representing over 200 repetitions).
Figure 2Graph (A) and boxplot (B) of the score value for 1,000 overexpressed simulated genes, for different expression footprint sizes over 200 repetitions.
Figure 3Graph of the score value for 1,000 overexpressed (A) and underexpressed (B) simulated genes, for different expression levels and for 200 repetitions.
Description of the different cancer studies. The two rightmost columns give the number of putative driver genes identified by TRIAGE.
| GEO ID | CANCER TYPE | PLATFORM | SAMPLE SIZE | NO OF SURVIVAL DATA | SURVIVAL OUTCOME | REF. | NO OF POTENTIAL ONCOGENES | NO OF POTENTIAL TUMOR SUPPRESSORS |
|---|---|---|---|---|---|---|---|---|
| GSE9891 | Ovarian | HU133Plus2.0 | 295 | 220 | RFS | [29] | 171 | 227 |
| GSE16011 | Glioma | HU133Plus2.0 | 284 | 266 | OS | [30] | 985 | 784 |
| GSE17538 | Colon | HU133Plus2.0 | 232 | 232 | RFS | [31] | 71 | 63 |
| GSE3141 | Lung | HU133Plus2.0 | 111 | 111 | OS | [32] | 46 | 26 |
| Combined study | Breast | HU133A+B | 741 | 624 | DMFS | [33] | 445 | 118 |
Notes:
GSE3494, GSE1456, GSE6532, GSE4922.
Figure 4Number of genes in common among different cancer types.
Genes common to the five cancer studies.
| CHR. | GENE SYMBOL | GENE NAME | GSE17536 COLON | GSE16011 GLIOMA | GSE3494 BREAST | GSE3141 LUNG | GSE9891 OVARIAN | TOTAL |
|---|---|---|---|---|---|---|---|---|
| 4q | IL8 | Interleukin 8 | 1* | 1* | 1* | 1* | 0* | 4 |
| 7q | COL1A2 | Collagen, type I, alpha 2 | 0* | 1 | 0 | 1 | 1 | 3 |
| 9q | ASPN | Asporin | 1 | 1 | 0 | 0 | 1 | 3 |
| 11q | MMP1 | Matrix metallopeptidase 1 (interstitial collagenase) | 1* | 1* | 1* | 0* | 0* | 3 |
| 1q | RPTN | Repetin | 0 | 1 | 0 | 1 | 0 | 2 |
| 1q | S100A2 | S100 calcium binding protein A2 | 1* | 1 | 0* | 0* | 0 | 2 |
| 3q | PLSCR4 | Phospholipid scramblase 4 | 0 | 1 | 0 | 0 | 1 | 2 |
| 3q | TM4SF1 | Transmembrane 4 L six family member 1 | 1* | 1 | 0* | 0* | 0* | 2 |
| 4q | LOC255130 | Uncharacterized | 0 | 1 | 0 | 1 | 0 | 2 |
| 4q | CXCL6 | Chemokine (C-X-C motif) ligand 6 (granulocyte chemotactic protein 2) | 0* | 1 | 0* | 1* | 0 | 2 |
| 4q | CXCL3 | Chemokine (C-X-C motif) ligand 3 | 0* | 1 | 0 | 1 | 0 | 2 |
| 4q | AREG | Amphiregulin (schwannoma-derived growth factor) | 1* | 1* | 0* | 0* | 0* | 2 |
| 4q | C4orf46 | Chromosome 4 open reading frame 46 | 0 | 1 | 1 | 0 | 0 | 2 |
| 5q | FST | Follistatin | 0 | 1 | 0 | 0* | 1* | 2 |
| 5q | GPX8 | Glutathione peroxidase 8 (putative) | 0 | 1 | 0 | 0 | 1 | 2 |
| 5q | C5orf46 | Chromosome 5 open reading frame 46 | 1 | 1 | 0 | 0 | 0 | 2 |
| 6p | F13A1 | Coagulation factor XIII, A1 polypeptide | 0 | 1 | 0 | 0 | 1 | 2 |
| 6p | LY86 | Lymphocyte antigen 86 | 0 | 1 | 0 | 0 | 1 | 2 |
| 6p | HIST1H1D | Histone cluster 1, H1d | 0 | 1 | 1 | 0 | 0 | 2 |
| 6p | HIST1H2AL | Histone cluster 1, H2al | 0 | 1 | 0 | 1 | 0 | 2 |
| 6q | EYA4 | Eyes absent homolog 4 (Drosophila) | 0 | 1 | 0 | 1 | 0 | 2 |
| 7p | SKAP2 | Src kinase associated phosphoprotein 2 | 0 | 1 | 0 | 0 | 1 | 2 |
| 7p | HOXA3 | Homeobox A3 | 0 | 1 | 0 | 0 | 1 | 2 |
| 7p | HOXA-AS2 | HOXA cluster antisense RNA 2 (non-protein coding) | 0 | 1 | 0 | 0 | 1 | 2 |
| 7p | HOXA4 | Homeobox A4 | 0 | 1 | 0 | 0 | 1* | 2 |
| 7p | HOXA5 | Homeobox A5 | 0 | 1 | 0* | 0* | 1 | 2 |
| 7p | TAX1BP1 | Tax1 (human T-cell leukemia virus type I) binding protein 1 | 0 | 1 | 0 | 1 | 0 | 2 |
| 7q | HGF | Hepatocyte growth factor (hepapoietin A; scatter factor) | 0* | 1* | 0* | 0* | 1* | 2 |
| 7q | GNG11 | Guanine nucleotide binding protein (G protein), gamma 11 | 1 | 1 | 0 | 0 | 0 | 2 |
| 8p | DLC1 | Deleted in liver cancer 1 | 0* | 1 | 0* | 0* | 1 | 2 |
| 8q | ANGPT1 | Angiopoietin 1 | 0* | 1* | 0* | 0* | 1* | 2 |
| 8q | GPR172A | G protein-coupled receptor 172A | 0 | 1 | 1 | 0 | 0 | 2 |
| 9q | ECM2 | Extracellular matrix protein 2, female organ and adipocyte specific | 0 | 1 | 0 | 0 | 1 | 2 |
| 10q | ZWINT | ZW10 interactor | 0 | 1 | 1 | 0 | 0 | 2 |
| 10q | ACSL5 | Acyl-Coa synthetase long-chain family member 5 | 0* | 1* | 0 | 0 | 1 | 2 |
| 11p | HRAS | V-Ha-ras Harvey rat sarcoma viral oncogene homolog | 0* | 1* | 1* | 0* | 0* | 2 |
| 11p | FIBIN | Fin bud initiation factor | 0 | 1 | 0 | 0 | 1 | 2 |
| 11p | LGR4 | Leucine-rich repeat-containing G protein-coupled receptor 4 | 0 | 0 | 1 | 0 | 1 | 2 |
| 11q | CFL1 | Coflin 1 (non-muscle) | 0* | 0* | 1* | 1* | 0* | 2 |
| 11q | PPP1CA | Protein phosphatase 1, catalytic subunit, alpha isoform | 0 | 1 | 1* | 0 | 0 | 2 |
| 11q | PDGFD | Platelet derived growth factor D | 0 | 1* | 0 | 0* | 1* | 2 |
| 11q | CASP4 | Caspase 4, apoptosis-related cysteine peptidase | 0 | 1 | 0* | 0* | 1* | 2 |
| 12p | NDUFA9 | NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 9, 39 kDa | 0 | 1 | 1 | 0 | 0 | 2 |
| 12p | OLR1 | Oxidized low density lipoprotein (lectin-like) receptor 1 | 0 | 1 | 0* | 0 | 1 | 2 |
| 12p | GABARAPL1 | GABA(A) receptor-associated protein like 1 | 1 | 0 | 0* | 0 | 1 | 2 |
| 12q | HOXC13 | Homeobox C13 | 0 | 1 | 1 | 0 | 0 | 2 |
| 12q | HOTAIR | Hox transcript antisense intergenic RNA | 0* | 1 | 1* | 0 | 0 | 2 |
| 12q | RAP1B | RAP1B, member of RAS oncogene family | 0 | 1* | 1 | 0 | 0 | 2 |
| 12q | ATP5H | ATP synthase, H+ transporting, mitochondrial f0 complex, subunit d | 0 | 1 | 1 | 0 | 0 | 2 |
| 12q | NUP107 | Nucleoporin 107 kDa | 0 | 1 | 1 | 0 | 0 | 2 |
| 12q | MDM2 | Mdm2, transformed 3T3 cell double minute 2, p53 binding protein (mouse) | 0* | 1* | 1* | 0* | 0* | 2 |
| 12q | LUM | Lumican | 1* | 0 | 0* | 0* | 1 | 2 |
| 12q | DCN | Decorin | 1* | 0* | 0* | 0* | 1* | 2 |
| 13q | MIR1244–1 | MicroRNA 1244 | 0 | 1 | 1 | 0 | 0 | 2 |
| 13q | PTMA | Prothymosin, alpha (gene sequence 28) | 0* | 1 | 1 | 0* | 0 | 2 |
| 13q | LOC441454 | Prothymosin, alpha pseudogene | 0 | 1 | 1 | 0 | 0 | 2 |
| 14q | SNORD114–3 | Small nuclear RNA, C/D box 114–3 | 1 | 0 | 0 | 0 | 1 | 2 |
| 16p | C16orf59 | Chromosome 16 open reading frame 59 | 0 | 0 | 1 | 1 | 0 | 2 |
| 16q | GOT2 | Glutamic-oxaloacetic transaminase 2, mitochondrial (aspartate aminotransferase 2) | 0 | 1 | 1 | 0 | 0 | 2 |
| 16q | CKLF | Chemokine-like factor | 0 | 1 | 1 | 0 | 0 | 2 |
| 17q | BRIP1 | BRCA1 interacting protein C-terminal helicase 1 | 0* | 1 | 1* | 0 | 0* | 2 |
| 17q | ABCA8 | ATP-binding cassette, sub-family A (ABC1), member 8 | 0 | 1 | 0 | 0 | 1 | 2 |
| 18q | ALPK2 | Alpha-kinase 2 | 0 | 1 | 0 | 0 | 1 | 2 |
| 18q | DSEL | Dermatan sulfate epimerase-like | 0 | 1 | 0 | 0 | 1 | 2 |
| 19q | C19orf48 | Chromosome 19 open reading frame 48 | 0 | 1 | 1 | 0 | 0 | 2 |
| 20q | RPN2 | Ribophorin II | 0 | 1 | 1* | 0 | 0 | 2 |
| 20q | TTI1 | TELO2 interacting protein 1 | 0 | 1 | 1 | 0 | 0 | 2 |
| 22q | POM121L9P | POM121 membrane glycoprotein-like 9, pseudogene | 0 | 1 | 0 | 0 | 1 | 2 |
| 1q | TXNIP | Thioredoxin interacting protein | 0* | 1 | 1* | 0 | 0 | 2 |
| 1q | TTC13 | Tetratricopeptide repeat domain 13 | 0 | 1 | 0 | 0 | 1 | 2 |
| 2q | MRPL30 | Mitochondrial ribosomal protein L30 | 0 | 1 | 0 | 0 | 1 | 2 |
| 4p | CCDC96 | Coiled-coil domain containing 96 | 1 | 0 | 0 | 0 | 1 | 2 |
| 5q | DMXL1 | Dmx-like 1 | 0 | 1 | 0 | 1 | 0 | 2 |
| 6p | PPP2R5D | Protein phosphatase 2, regulatory subunit B’, delta isoform | 0 | 1 | 0 | 0 | 1 | 2 |
| 8q | ZNF704 | Zinc finger protein 704 | 1 | 1 | 0 | 0 | 0 | 2 |
| 8q | PAG1 | Phosphoprotein associated with glycosphingolipid microdomains 1 | 1 | 1 | 0 | 0* | 0 | 2 |
| 10q | ANK3 | Ankyrin 3, node of Ranvier (ankyrin G) | 0 | 1 | 0 | 1 | 0 | 2 |
| 11q | PITPNM1 | Phosphatidylinositol transfer protein, membrane-associated 1 | 1 | 0 | 0 | 0 | 1* | 2 |
| 12q | SET | SET translocation (myeloid leukemia-associated) | 0* | 1 | 0 | 0 | 1 | 2 |
| 17p | LoC284014 | Uncharacterized LOC284014 | 0 | 1 | 0 | 0 | 1 | 2 |
| 17p | ZfP3 | Zinc finger protein 3 homolog (mouse) | 0 | 1 | 0 | 0 | 1 | 2 |
| 17p | C17orf81 | Chromosome 17 open reading frame 81 | 0 | 1 | 0 | 0 | 1 | 2 |
| 17p | CYB5D1 | Cytochrome b5 domain containing 1 | 1 | 0 | 0 | 0 | 1 | 2 |
| 18q | TCF4 | Transcription factor 4 | 0* | 1* | 1* | 0* | 0 | 2 |
| 19p | CFD | Complement factor D (adipsin) | 0 | 0 | 1 | 1 | 0 | 2 |
| 21p | LOC100132288 | NA | 0 | 1 | 1 | 0 | 0 | 2 |
| 21p | LOC389834 | Hypothetical gene supported by AK123403 | 0 | 1 | 1 | 0 | 0 | 2 |
Notes: For each study, 1 indicates that the gene was selected by the TRIAGE methodology and 0 indicates otherwise. A star (*) indicates that the gene has been shown to be associated with the disease (according to Genecards, www.genecards.org). The last column of the table presents the number of studies in which the gene was identified as potential driver. Details on hazard ratios and P-values are provided in Additional Files 4 and 5.
Figure 5Heatmap representation of the LSVD score for the windows centered on (A) MMP1, (B) IL8, (C) COL1A2.
Note: A star * indicates if the gene was shown to be associated with the cancer (according to Genecards, www.genecards.org).