| Literature DB >> 25148247 |
Richard Newton1, Lorenz Wernisch1.
Abstract
Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments.Entities:
Mesh:
Year: 2014 PMID: 25148247 PMCID: PMC4141782 DOI: 10.1371/journal.pone.0105522
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Details of the 30 data sets used in the meta-analysis.
| Code | GEO | Publication | N | P | Pathology |
| parr | GSE20486 | Parris et al. 2010 | 97 | 18616 | Breast Cancer (Diploid) |
| crow | GSE15134 | Crowder et al. 2009 | 31 | 16153 | Breast Cancer (ER+) |
| sirc | GSE17907 | Sircoulomb et al. 2010 | 51 | 14689 | Breast Cancer (ERBB2 amplified) |
| myll |
| Myllykangas et al. 2008 | 46 | 17050 | Gastric Cancer |
| junn |
| Junnila et al. 2010 | 10 | 16844 | Gastric Cancer |
| ch.w |
| Chitale et al. 2009 | 91 | 10285 | Lung adenocarcinoma |
| ch.s |
| Chitale et al. 2009 | 94 | 10285 | Lung adenocarcinoma |
| hoac | GSE20154 | Goh et al. 2011 | 54 | 14388 | Oesophageal adenocarcinoma |
| zho | GSE29023 | Zhou et al. 2012 | 115 | 13697 | Multiple Myeloma |
| shai | GSE26089 | Shain et al. 2012 | 68 | 14201 | Pancreatic Cancer |
| vain | GSE28403 | Vainio et al. 2012 | 13 | 10107 | Prostate Cancer |
| bott | GSE29211 | Bott et al. 2011 | 53 | 10321 | Pleural Mesothelioma |
| bekh | GSE23720 | Bekhouche et al. 2011 | 173 | 13682 | Breast Cancer (Inflammatory) |
| chap | GSE26863 | Chapman et al. 2011 | 245 | 13667 | Multiple Myeloma |
| ooi | GSE22785 | Ooi et al. 2012 | 14 | 10091 | Neuroblastoma |
| brag | GSE12668 | Braggio et al. 2009 | 11 | 10310 | Waldenströms Macroglobulinemia |
| jons | GSE22133 | Jönsson et al. 2010 | 356 | 4183 | Breast Cancer |
| mura | GSE24707 | Muranen et al. 2011 | 47 | 4472 | Breast Cancer |
| lin1 | GSE19915 | Lindgren et al. 2010 | 72 | 4965 | Urothelial Carcinoma |
| beck | GSE17555 | Beck et al. 2010 | 18 | 12174 | Leiomyosarcoma |
| toed | GSE18166 | Toedt et al. 2011 | 74 | 4289 | Astrocytic Gliomas |
| ell | GSE35191 | Ellis et al. 2012 | 124 | 13569 | Breast Cancer |
| gra.1 | GSE35988 | Grasso et al. 2012 | 85 | 12849 | Prostate Cancer |
| gra.2 | GSE35988 | Grasso et al. 2012 | 34 | 12813 | Prostate Cancer |
| lenz | GSE11318 | Lenz et al. 2009 | 203 | 15212 | Lymphoma |
| lin2 | GSE32549 | Lindgren et al. 2012 | 131 | 8450 | Urothelial Carcinoma |
| micc | GSE38230 | Micci et al. 2013 | 12 | 16657 | Vulva Squamous Cell Carcinoma |
| tayl | GSE21032 | Taylor et al. 2010 | 155 | 14572 | Prostate Cancer |
| coco | GSE25711 | Coco et al. 2012 | 36 | 4394 | Neuroblastoma |
| med | GSE14079 | Medina et al. 2009 | 8 | 6376 | Lung Cancer |
GEO = Gene Expression Omnibus data set reference (http://www.ncbi.nlm.nih.gov/geo/), N = Number of samples, P = Number of matched probes, http://www.cangem.org/, http://cbio.mskcc.org/Public/lung_array_data/, Expression data in ArrayExpress (http://www.ebi.ac.uk/arrayexpress/): E-TABM-38, E-MTAB-161.
Figure 1Schematic diagram illustrating the key analysis steps.
Top 30 potential regulators - not transcription factors, based on the Spearman correlation of a gene's aCGH with its expression, from a meta-analysis of the 30 data sets.
| Gene | Chr | Locus |
| N | Annotation |
| PCM1 | 8 | 22-p | 5.9e-05 | 17 | Pericentriolar Material 1 |
| ELP3 | 8 | 21.1p | 5.9e-05 | 17 | Elongator Acetyltransferase Complex Subunit 3 |
| MED4 | 13 | 14.12q | 5.9e-05 | 17 | Mediator complex subunit 4 |
| MCPH1 | 8 | 23.1p | 5.9e-05 | 16 | Microcephalin 1 |
| COPS3 | 17 | 11.2p | 0.0087 | 16 | COP9 constitutive photomorphogenic homolog subunit 3 |
| PREP | 6 | 22q | 5.9e-05 | 15 | Prolyl endopeptidase |
| DDX10 | 11 | 22-q | 5.9e-05 | 15 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 10 |
| BCL9 | 1 | 21q | 5.9e-05 | 15 | B-cell CLL/lymphoma 9 |
| CDC16 | 13 | 34q | 5.9e-05 | 15 | Cell division cycle 16 |
| HDAC2 | 6 | 21q | 5.9e-05 | 15 | Histone deacetylase 2 |
| AZIN1 | 8 | 21.3q | 5.9e-05 | 15 | Antizyme inhibitor 1 |
| SS18L1 | 20 | 13.3q | 5.9e-05 | 14 | Synovial sarcoma translocation gene on chromosome 18-like 1 |
| TGDS | 13 | 32.1q | 5.9e-05 | 14 | TDP-glucose 4,6-dehydratase |
| YTHDF1 | 20 | 13.33q | 5.9e-05 | 14 | YTH domain family, member 1 |
| COG2 | 1 | 42.2q | 5.9e-05 | 14 | Component of oligomeric golgi complex 2 |
| PPP2R2A | 8 | 21.2p | 5.9e-05 | 14 | Protein phosphatase 2, regulatory subunit B, alpha |
| PTDSS1 | 8 | 22q | 5.9e-05 | 14 | Phosphatidylserine synthase 1 |
| AKAP11 | 13 | 14.11q | 5.9e-05 | 14 | A kinase (PRKA) anchor protein 11 |
| IKBKB | 8 | 11.2p | 5.9e-05 | 14 | Inhib. of kappa light polyp. gene enhancer in B-cells, kinase beta |
| MBTPS1 | 16 | 24q | 5.9e-05 | 14 | Membrane-bound transcription factor peptidase, site 1 |
| UCHL3 | 13 | 21.33q | 5.9e-05 | 14 | Ubiquitin carboxyl-terminal esterase L3 (ubiquitin thiolesterase) |
| AARS | 16 | 22q | 5.9e-05 | 14 | Alanyl-tRNA synthetase |
| ATXN10 | 22 | 13q | 5.9e-05 | 14 | Ataxin 10 |
| RAF1 | 3 | 25p | 5.9e-05 | 14 | V-Raf-1 murine leukemia viral oncogene homolog 1 |
| PPP3CC | 8 | 21.3p | 5.9e-05 | 14 | Protein phosphatase 3, catalytic subunit, gamma isozyme |
| TBCE | 1 | 42.3q | 5.9e-05 | 14 | Tubulin folding cofactor E |
| RIPK2 | 8 | 21q | 0.0087 | 14 | Receptor-interacting serine-threonine kinase 2 |
| INTS6 | 13 | 14.3q | 0.0087 | 14 | Integrator complex subunit 6 |
| UBAP2 | 9 | 11.2p | 0.0087 | 14 | Ubiquitin associated protein 2 |
| GNA12 | 7 | 22.3p | 0.0087 | 14 | Guanine nucleotide binding protein (G protein) alpha 12 |
Chr = Chromosome, Locus = Gene locus, p-value = B-H adjusted p-value, N = number of data sets with significant correlation (B-H adjusted p-value <0.05).
Top 30 potential regulators - transcription factors, based on the Spearman correlation of a gene's aCGH with its expression, from a meta-analysis of the 30 data sets.
| Gene | Chr | Locus |
| N | Annotation |
| GTF2F2 | 13 | 14q | 5.9e-05 | 16 | General transcription factor IIF, polypeptide 2 |
| TAF2 | 8 | 24q | 5.9e-05 | 14 | TATA box binding protein (TBP)-associated factor |
| SETDB1 | 1 | 21q | 5.9e-05 | 14 | SET domain, bifurcated 1 |
| ELF1 | 13 | 13q | 0.0087 | 14 | E74-like factor 1 (ets domain transcription factor) |
| YWHAZ | 8 | 22.3q | 5.7e-05 | 13 | Tyrosine/tryptophan activation protein, zeta polypeptide |
| PARP1 | 1 | 41-q | 0.0087 | 13 | Poly (ADP-ribose) polymerase 1 |
| ACTL6A | 3 | 26.33q | 0.0087 | 13 | Actin-like 6A |
| PSMB1 | 6 | 27q | 0.0087 | 13 | Proteasome subunit, beta type, 1 |
| SMARCA2 | 9 | 24.3p | 0.0087 | 13 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 2 |
| NCOR1 | 17 | 11.2p | 0.0087 | 13 | Nuclear receptor corepressor 1 |
| MAP3K7 | 6 | 15q | 0.0087 | 13 | Mitogen-activated protein kinase kinase kinase 7 |
| HSBP1 | 16 | 23.3q | 5.7e-05 | 12 | Heat shock factor binding protein 1 |
| SMARCE1 | 17 | 21.2q | 5.9e-05 | 12 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily e, member 1 |
| POGZ | 1 | 21.1q | 5.9e-05 | 12 | Pogo transposable element with ZNF domain |
| RCOR3 | 1 | 32.3q | 5.9e-05 | 12 | REST corepressor 3 |
| TRIM33 | 1 | 13.1p | 5.9e-05 | 12 | Tripartite motif containing 33 |
| ARID4B | 1 | 42.1-q | 5.9e-05 | 12 | AT rich interactive domain 4B (RBP1-like) |
| MNAT1 | 14 | 23q | 5.9e-05 | 12 | Menage a trois homolog 1, cyclin H assembly factor (X. laevis) |
| NFATC3 | 16 | 22q | 5.9e-05 | 12 | Nucl. factor of activated T-cells, cytoplasmic, calcineurin-dep. 3 |
| TBP | 6 | 27q | 5.9e-05 | 12 | TATA box binding protein |
| AATF | 17 | 12q | 5.9e-05 | 12 | Apoptosis antagonizing transcription factor |
| SMAD2 | 18 | 21q | 5.9e-05 | 12 | SMAD family member 2 |
| AP2B1 | 17 | 11.2-q | 0.0087 | 12 | Adaptor-related protein complex 2, beta 1 subunit |
| SNAPC3 | 9 | 22.3p | 0.0087 | 12 | Small nuclear RNA activating complex, polypeptide 3 |
| SNW1 | 14 | 22.1-q | 0.0087 | 12 | SNW domain containing 1 |
| SMARCC1 | 3 | 21.31p | 0.0087 | 12 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily c, member 1 |
| HSF2 | 6 | 22q | 0.0087 | 12 | Heat shock transcription factor 2 |
| PSIP1 | 9 | 22.2p | 0.0087 | 12 | PC4 and SFRS1 interacting protein 1 |
| RB1 | 13 | 14.2q | 0.0087 | 12 | Retinoblastoma 1 |
| CREBBP | 16 | 13.3p | 0.0087 | 12 | CREB binding protein |
Chr = Chromosome, Locus = Gene locus, p-value = B-H adjusted p-value, N = number of data sets with significant correlation (B-H adjusted p-value<0.05).
Figure 2Histogram showing the number of genes which are potential regulators in different numbers of data sets.
For each gene the number of individual data sets in which the Spearman correlation between a gene's aCGH and expression has an B-H adjusted p-value <0.05 is counted. The graph shows a histogram of these counts. Only those genes which have a combined B-H adjusted p-value <0.05 are included in the histogram.
Figure 3Breakdown of potential regulators in terms of number of data sets with & without aCGH/expression correlation and with & without copy number variation.
Genes have been grouped according to the number of data sets in which they displayed significant aCGH/expression correlation (so from 1 data set to the maximum of 17 data sets). These groups are displayed along the horizontal axis. For each group the following five averages were calculated and displayed in the graph: 1. The average number of data sets where genes are not annotated (white bars). 2. The average number of data sets where genes do not have significant aCGH/expression correlation and do not show copy number variation (pink bars). 3. The average number of data sets where genes do not have significant aCGH/expression correlation but do show copy number variation (red bars). 4. The average number of data sets where genes have significant aCGH/expression correlation and no copy number variation (light blue bars). 5. The average number of data sets where genes have significant aCGH/expression correlation and copy number variation (dark blue bars). Were presence of copy number variation defined by the arbitrary threshold discussed in the text.
Figure 4Boxplot showing the within data set cross-validation consistency.
For the 30 data sets (a) enrichment scores and (b) average B-H adjusted p-values of enrichment scores. Each data set was randomly halved. Spearman correlation of genes' aCGH and expression values was used to rank genes in each half data set. The top 10 from the first half was used as a gene-set and scored for enrichment in the second half. This was repeated for 10 random divisions of each data set.
Figure 5Clustering data sets according to enrichment scores.
Spearman correlation of genes' aCGH and expression values was used to rank genes in each data set. The significant genes from one data set was used as a gene-set and scored for enrichment in the second half, and vice-versa. The two enrichment scores were averaged and this value minus one used as a distance measure for clustering, using Ward's method. The nine data sets with low within data set consistency were excluded from the clustering (pr = prostate, lg = lung, oa = oesophageal, ly = lymphoma, bl = bladder, br = breast, ne = neuroblastoma, pl = pleural, ps = myeloma, pn = pancreas, ga = gastric, bn = glioma).
Figure 6Bar charts showing the number of predicted targets for each potential regulator.
At a significance level of 0.05 (red) and 0.1 (blue) a. positive regulation, top 30 potential regulators which are not transcription factors (TF) b. negative regulation, top 30 potential regulators which are not TF c. positive regulation, top 30 potential regulators which are TF d. negative regulation, top 30 potential regulators which are TF.
Figure 7Histogram plotting the number of predicted targets which are significant.
(B-H adjusted p-value <0.1) in different numbers of data sets for HSPB1.
For each regulator, comparing percentage of data sets which, when analysed individually, predict at least one of the targets that are predicted by the meta-analysis, with percentage of data sets in which the gene-set of targets that are predicted by the meta-analysis has significant enrichment in the individual data sets' ranked lists of genes.
| Gene | data sets | % Containing | % Enriched |
| Positive (not TF) | |||
| UCHL3 | 14 | 21 | 14 |
| Negative (not TF) | |||
| MED4 | 17 | 47 | 53 |
| DDX10 | 15 | 47 | 33 |
| BCL9 | 15 | 40 | 33 |
|
|
|
|
|
| PTDSS1 | 14 | 29 | 50 |
| AARS | 14 | 43 | 57 |
| TBCE | 14 | 29 | 14 |
| RIPK2 | 14 | 29 | 64 |
| Positive (TF) | |||
| HSBP1 | 12 | 58 | 58 |
| POGZ | 12 | 58 | 100 |
| SMAD5 | 10 | 70 | 70 |
| Negative (TF) | |||
| SETDB1 | 14 | 36 | 36 |
| YWHAZ | 13 | 46 | 69 |
| HSBP1 | 12 | 75 | 75 |
| POGZ | 12 | 67 | 91 |
| NFATC3 | 12 | 50 | 50 |
| RB1 | 12 | 33 | 58 |
| E2F5 | 11 | 36 | 55 |
| ADAR | 11 | 18 | 18 |
| SMAD5 | 10 | 60 | 70 |
| NCOA6 | 10 | 20 | 20 |
| ARNT | 10 | 50 | 80 |
data sets = number of data sets in which the regulator shows significant correlation between its own aCGH and expression, % Containing = percentage of data sets which, when analysed individually predict at least one of the targets that are predicted by the meta-analysis, % Enriched = percentage of data sets in which the gene-set of targets that are predicted by the meta-analysis has significant enrichment in the individual data sets' ranked lists of genes, TF = Transcription Factor.
Supporting evidence for regulator-target predictions.
| Regulator | N. of Tg. | Co-Cites | Tg. Co-Cites | Enriched GO annotations | GO | Enriched Pathways | Path. |
|
| |||||||
| UCHL3 | 1 | n/a | - | ||||
|
| |||||||
| MED4 | 14 |
| 1 (2) | GO:0065004 protein-DNA complex assembly | 0.153 (2/97) | Resolution of Sister Chromatid Cohesion (R) | 0.03 (2/77) |
| DDX10 | 11 |
| 4 (2) | GO:0048858 cell projection morphogenesis | 0.001 (5/350) | EGFR downregulation (R) | 0.005 (2/13) |
| BCL9 | 12 | 1 (2) | GO:0002683 negative regulation of immune system process | 0.01 (3/91) | - | - | |
| AZIN1 | 68 | 68 (5) | GO:0005515 protein binding | 0.004 (44/3651) | ALK1 signaling events (P) | 0.002 (4/20) | |
| PTDSS1 | 6 | 3 (2) | GO:0033627 cell adhesion mediated by integrin | 0.006 (2/28) | - | - | |
| AARS | 17 | 3 (2) | GO:0033059 cellular pigmentation | 0.03 (2/14) | NGF signalling via TRKA from the plasma membrane (R) | 0.04 (2/95) | |
| TBCE | 5 | 0 | GO:0000226 microtubule cytoskeleton organization | 0.04 (2/116) | - | - | |
| RIPK2 | 12 |
| 1 (2) | GO:0030097 hemopoiesis | 0.03 (4/302) | Class B/2 (Secretin family receptors) (W) | 0.03 (2/43) |
|
| |||||||
| HSBP1 | 70 | 216 (4) | GO:0002697 regulation of immune effector process | 0.008 (8/154) | Primary immunodeficiency - H. sapiens (K) | 0.007 (4/24) | |
| POGZ | 142 |
| 339 (6) | GO:0019222 regulation of metabolic process | 1.25e-06 (77/2287) | Mismatch repair - H. sapiens (K) | 0.02 (4/15) |
| SMAD5 | 25 | 2 (2) | - | - | Host Interactions of HIV factors (R) | 0.04 (2/26) | |
|
| |||||||
| SETDB1 | 8 | 0 | GO:0048589 developmental growth | 0.14 (2/140) | - | - | |
| YWHAZ | 67 |
| 19 (3) | GO:0005085 guanyl-nucleotide exchange factor activity | 0.03 (5/103) | Alpha4 beta1 integrin signaling events (P) | 0.06 (3/23) |
| HSBP1 | 68 | 51 (4) | GO:0019058 viral infectious cycle | 0.03 (7/143) | Apoptotic execution phase (R) | 0.06 (3/26) | |
| POGZ | 311 |
| 650 (9) | GO:0044419 interspecies interaction between organisms | 0.0002 (28/256) | Phagosome - Homo sapiens (K) | 0.003 (15/84) |
| NFATC3 | 23 |
| 4 (2) | GO:0022604 regulation of cell morphogenesis | 0.006 (5/143) | Fc-epsilon receptor I signaling in mast cells (P) | 0.001 (3/24) |
| RB1 | 4 | 0 | GO:0036211 protein modification process | 0.002 (5/1278) | miR-targeted genes in epithelium - TarBase (W) | 0.005 (2/131) | |
| E2F5 | 15 |
| 38 (3) | GO:0034329 cell junction assembly | 0.06 (3/123) | Integrin cell surface interactions (P) | 0.002 (3/45) |
| ADAR | 1 |
| n/a | GO:0034097 response to cytokine stimulus | 0.02 (2/276) | - | - |
| SMAD5 | 24 |
| 3 (3) | GO:0065008 regulation of biological quality | 0.03 (11/1159) | Cytosolic sensors of pathogen-associated DNA (R) | 0.01 (2/16) |
| NCOA6 | 1 | n/a | - | - | - | - | |
| ARNT | 22 | 6 (3) | GO:0009057 macromolecule catabolic process | 0.002 (8/450) | HIF-2-alpha transcription factor network (P) | 0.04 (2/17) |
TF = Transcription Factor; N of Tg. = Number of Predicted Targets at a fdr significance level of 0.05; CoCites = Papers which co-cite both Regulator and a predicted target, from PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) using Bioconductor package org.Hs.eg.db version 2.9.0 [80] restricted to papers with less than 150 gene links, and also from manual search of PMC (http://www.ncbi.nlm.nih.gov/pmc/); Tg. Co-Cites = Number of papers that cite at least two of the predicted targets, with (in brackets) the maximum number of targets in any one paper, from PubMed using Bioconductor package org.Hs.eg.db version 2.9.0 [80] restricted to papers with less than 150 gene links; Enriched GO annotations and Pathways using ConsensusPathDB [81]–[83] (R = Reactome, W = WikiPathways, P = Pathway Interactions Database, K = Kegg), with q-values and (in brackets) the number of genes from list (composed of a regulator and its predicted targets) in the GO annotation or pathway and the total number of genes in the GO annotation or pathway.