| Literature DB >> 23845084 |
Tamasin N Doig1, David A Hume, Thanasis Theocharidis, John R Goodlad, Christopher D Gregory, Tom C Freeman.
Abstract
BACKGROUND: Biopsies taken from individual tumours exhibit extensive differences in their cellular composition due to the inherent heterogeneity of cancers and vagaries of sample collection. As a result genes expressed in specific cell types, or associated with certain biological processes are detected at widely variable levels across samples in transcriptomic analyses. This heterogeneity also means that the level of expression of genes expressed specifically in a given cell type or process, will vary in line with the number of those cells within samples or activity of the pathway, and will therefore be correlated in their expression.Entities:
Mesh:
Year: 2013 PMID: 23845084 PMCID: PMC3721986 DOI: 10.1186/1471-2164-14-469
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Rationale behind the study. The relative number of a specific cell type or activity of certain pathways will vary across a collection of individual tumours. For example, the macrophage content (Φ) will differ in every tumour and so therefore will the mRNA level of macrophage specific genes (in blue). Similarly in every tumour at the point it is sampled the number of cells in mitosis (the mitotic index) will differ and this will be reflected in different levels of expression of cell cycle genes (in red). As a result the expression level of genes specifically expressed by those cells or associated specifically with the pathways will vary accordingly. By calculating the correlation coefficient between every gene on the array and every other gene on the array it is possible to calculate a correlation matrix that includes all these correlation coefficients. Graphs are then used to visualise relationships above a given correlation threshold and clustering used identifying groups of co-expressed genes.
Figure 2Network graphs derived from six cancer datasets; a) breast carcinoma, b) colorectal carcinoma, c) DLBCL, d) glioma, e) ovarian carcinoma and f) testicular germ cell tumours. Each dataset studied had its own idiosyncrasies owing to the tumour-specific biology each represents and the high degree of inherent variation in gene expression data derived from cancer samples. In order to visualise and analyse a large proportion of the expressed genes in these samples we aimed to construct graphs of data derived from individual cancer types based on just under half the transcripts represented on the chip (18,000-23,000 probesets). For these reasons relatively low correlation thresholds were used for graph construction i.e. between r = 0.65-0.75. The resultant graphs of individual cancer datasets are highly structured and composed of a large number of nodes and edges (18,934 - 23,015 nodes, connected by between 268,471 - 954,082 edges, see Table 1 for details).
Summary of the gene coexpression clusters conserved across all datasets studied here
| Immune | Macrophage | 7 | 220 | 163 | CD68, CD14, CD163, CSF1R, Fc Receptors (CD16, CD32, CD64), MHC II molecules | Immune system process (3.96e-32) | | 16 | 1.3E-120 |
| Defence response (8.79e-22) | |||||||||
| T cell | 8 | 181 | 145 | CD2, CD3, CD6, CD7, CD52, TCR | Immune system process (4.99e-35) | TCR signalling pathway (6.1e-25) * | 6 | 3.29E-93 | |
| Signal transduction (7.32e-18) | |||||||||
| T cell activation (2.58e-19) | |||||||||
| Macrophage/T cell interface | 13 | 87 | 58 | | Immune response (7.20e-06) | | 34 | 2.6E-13 | |
| IFN response | 12 | 115 | 73 | GBP1. IFI27, IFR1, IRF2, OAS1, SP100, STAT1, | Immune response (4.32e-26) | Genes upregulated by IFNB in HT1080 (1.48e-47)** | 51 | 4.06E-56 | |
| Response to virus (1.15e-21) | |||||||||
| Genes upregulated by IFNA in HT1080 (1.05e-43)** | |||||||||
| MHC class I | 19 | 35 | 16 | HLA-A, HLA-B, HLA-C, HLA-E, B2M | Antigen processing and presentation (5.51e-17) | MHC 1 (9.29e-15)*** | 82 | 1.2E-58 | |
| Ig/ plasma cell | 14 | 85 | 36 | Ig light and heavy chains, | Immune response (7.49e-10) | Ig C region (1.12e-10) | 21 | 2.48E-129 | |
| B-cell | 79, 142 | 10, 7 | 6, 6 | CD19, CD20, CD79 | B-cell receptor complex (2.31e-06) | BCR signalling (1.28e-09)* | 6 | 6.52E-17 | |
| Immune response (1.84e-08) | |||||||||
| Mast cell | 93 | 9 | 4 | Tryptase, Fc receptor for IgE | Proteolysis (0.008) | Zymogen (4.77e-04)*** | 167 | 2.72E-27 | |
| AP1 response | 48, 89 | 13, 9 | 11, 6 | FOS, JUNB, | Sequence specific DNA binding (2.19e-06) | DNA binding (2.75e-09)*** | - | - | |
| Regulation of cellular process (1.2e-04) | |||||||||
| Stroma | Extracellular matrix | 9 | 163 | 100 | BGN, CALD1, FN1, collagens | Extracellular matrix (1.93e-40) | | 88 | 5.5E-34 |
| Cell adhesion (1.16e-17) | |||||||||
| Adipocyte | 27 | 22 | 15 | ADIPOQ, LPL | Response to wounding (0.004) | PPAR signalling (1.46e-07)* | - | - | |
| Adipocyte vs Fibroblast upregulated (1.49e-15)** | |||||||||
| Endothelium | 29,38,49 | 20, 17, 13 | 17, 13, 11 | CD31, CD34, Endomucin, Endoglin, vWF | Cell adhesion (1.3e-08) | Upregulated in glomerul in DM vs normal (8.33e-14)** | 58 | 6.75E-16 | |
| Blood vessel development (1.28e-10) | |||||||||
| Brentani_Angiogenesis (6.87e-08)** | |||||||||
| Endothelium/ECM | 59 | 11 | 6 | COL4A1, COL4A2 | Cell adhesion (4.13e-05) | | 215 | 1.22E-12 | |
| Smooth muscle | 88, 249 | 9,6 | 5, 4 | Alpha SMA, calponin | Smooth muscle contraction (4.75e-05) | Muscle protein (3.95e-08)*** | 907 | 1.27E-11 | |
| Skeletal muscle | 46 | 15 | 15 | Myoglobin, CKm, Myosin | Contractile fibre part (1.53e-17) | | 23 | 2.55E-31 | |
| Muscle development (2.09e-05) | |||||||||
| Cell cycle | Cell cycle | 6 | 239 | 182 | AURKA, BUB1, CHEK2, CDC2, MCM2 | Cell cycle (3.06e-59) | Serum fibroblast cell cycle (1.17e-117)** | 101 | 1.31E-26 |
| DNA replication (1.48e-39) | |||||||||
| Cell cycle related | 10, 16, 26 | 147, 52, 23 | 125, 44, 19 | | RNA binding (2.29e-11) | RNA binding (2.29e-14)*** | - | - | |
| Ribosomal | Ribosomal | 54, 60, 64, 97 | 12, 11, 11, 8 | 12, 8, 6, 5 | RPL38, RPS10, RPS19 | Cytosolic ribosome (7.91e-13) | Ribosomal protein (8.44e-11)*** | 54 | 4.25E-36 |
| Protein biosynthesis (1.62e-06)*** | |||||||||
| Other functional classes | Histones | 20 | 30 | 26 | HIST1H1C, HIST1H2AB, HIST1H3H | Nucleosome (9.75e-23) | Nucleosome core (2.61e-21)*** | 160 | 9.15E-35 |
| Chromatin assembly (9.24e-21) | |||||||||
| Glycolysis | 47 | 13 | 5 | GAPDH, GPI | Glycolysis (1.46e-05) | Gluconeogenesis (9.37e-07)*** | 151 | 6E-30 | |
| Haemoglobins | 91 | 9 | 2 | HBA1, HBB | | | 144 | 6E-31 | |
| Affymetrix controls | Affymetrix controls | 23, 28 | 26, 22 | 26, 22 | 99 | 4.57E-59 |
Each cluster or group of related clusters has been placed into a functional grouping based on the biology from which it is derived. Details of the cluster(s) are provided together with selected pathway/Gene ontology enrichment scores for the genes that make up the clusters. (For a complete list see Additional file 4).
Figure 3Clusters of transcripts (left) derived from the testicular cancer dataset and (right) associated expression profiles (average signal per gene or cluster). The individual tumours (represented along the y-axis) were grouped by mixed (green in upper bar) or pure histological subtype (red in upper bar) and then by components present (coloured blocks in lower bar) a) Haemoglobin cluster containing data derived from 5 probesets designed to HBA1 and 3 designed to HBB. The haemoglobin locus cluster is found to be present in many human expression datasets and is often unconnected to any other genes. b) Cluster of somatotrophin genes whose expression is normally tissue specific and limited to pituitary and placenta, shown here to be expressed predominately in tumours containing elements of choriocarcinoma, a tumour formed of malignant trophoblast cells, the normal equivalent of which are involved in placenta formation. c) HOX genes found to be expressed only in two teratomas with secondary carcinomatous transformation. Interestingly one of these tumours shows a high degree of up regulation of one group of primarily HOXB genes and the other a mix of HOXA/C/D genes.
Figure 4Network derived from testicular cancer dataset (Pearson correlation threshold r = 0.75). a) Network with only edges showing allowing visualization of the inherent complex topology of the graph and b) with nodes shown where nodes are coloured according to cluster membership. c) Graph with selected clusters shown and d) the average expression profile of genes within those clusters. (Cluster colour code is maintained across graphs in this figure). Cluster 68 is highly enriched with endothelial marker genes and cluster 23 contains many transcripts known to be associated with extracellular matrix. The last three selected clusters can be associated with different aspects of the immune infiltrate in these tumours. Cluster 2 contains many know markers of T-cell and B-cells, cluster 4 (also shown in Figure 2) is enriched with many known macrophage expressed genes and cluster 10 is highly enriched with many interferon response genes.
Figure 5Network graph of conserved transcription signatures in cancer. a) 3D graph layout with labelling of main features in the network’s topology. A graph of 9,882 nodes and 184,563 edges was created at Pearson threshold r≥0.6. Clustering of the graph using the MCL algorithm resulted in 639 clusters ranging in size from 1,008 nodes to 4 nodes. A number of large clusters were shown to be highly enriched in genes associated with expression in individual cell types and/or associated with specific cellular functions. See Additional files 3 and 4 for details b) Collapsed cluster diagram showing the conserved gene network as a simplified 2D-network where single nodes represent a gene cluster and are sized according to the number of transcripts within the cluster, edges represent connections between members of each cluster. Nodes representing the main clusters have been coloured according to functional groupings. Blue - clusters represent those associated with housekeeping functions; green - clusters of genes which are directly involved in cell cycle progression or whose expression is in way some linked to it; pink - genes associated with the immune component of the tumour and yellow - other stromal elements. Smaller clusters enriched with genes of known function are also shown.
Figure 6Conservation of transcriptional signatures in graph derived from skin cancer dataset (Pearson correlation threshold r = 0.80). In order to provide a clear view of transcripts/clusters within skin cancer graph it has been simplified. The network shown here has been constructed with a central framework of edges derived from the relationships between clusters and nodes representing the transcripts in each cluster joined to a central node representing the cluster with the graph laid out in 2D. Only clusters comprising more than 8 probesets were included. a) Colours represent different clusters in the skin cancer data, and b) overlay of clusters from the merged cancer (r = 0.6) graph displayed using larger nodes. Many of the housekeeping clusters (1–5, 11) can be seen to be conserved, as is a proportion of the cell cycle (6), macrophage (7), T-cell (8), ECM (9), interferon response (12), plasma cell (14), MHC class 1 (19), histones (20) and Affymetrix control (23) clusters. However it can also be seen that many of the skin cancer clusters are not represented in the merged cancer profile set, these transcriptional signatures being unique to skin cancers.
List of the cancer datasets used for this study
| GSE11318 | Lenz et al. (18765795) | DLBCL | 194 | Plus 2.0 | 19,850 | 614,273 |
| GSE1456 | Pawitan et al. (16280042) | Breast carcinoma | 134 | A & B | 19,246 | 559,761 |
| GSE9891 | Tothill et al. (18698038) | Ovarian (epithelial) carcinoma | 265 | Plus 2.0 | 19,415 | 268,471 |
| GSE3218 | Korkola et al. (16424014) | Testicular germ cell tumours | 86 | A & B | 18,934 | 954,082 |
| GSE13294 | Jorissen et al. (19088021) | Colorectal carcinoma | 150 | Plus 2.0 | 22,687 | 725,467 |
| caArray/rembr-00037 | REMBRANT – Repository for Molecular Brain Neoplasia Data | Primary CNS tumours | 253 | Plus 2.0 | 23,015 | 623,591 |
The datasets with a white background are the six used for the primary analysis and the remaining three (bold) the datasets used to confirm the robustness of the core cancer expression signatures.
Figure 7Approach to data analysis. a) Flow diagram summarising the analysis pipeline used here. The size of graph generated from each dataset at different Pearson correlation thresholds in terms of b) the number of nodes and c) number of edges.