| Literature DB >> 23836465 |
Eva Budinska1, Vlad Popovici, Sabine Tejpar, Giovanni D'Ario, Nicolas Lapique, Katarzyna Otylia Sikora, Antonio Fabio Di Narzo, Pu Yan, John Graeme Hodgson, Scott Weinrich, Fred Bosman, Arnaud Roth, Mauro Delorenzi.
Abstract
The recognition that colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behaviour and response to therapy translates into an urgent need for robust molecular disease subclassifiers that can explain this heterogeneity beyond current parameters (MSI, KRAS, BRAF). Attempts to fill this gap are emerging. The Cancer Genome Atlas (TGCA) reported two main CRC groups, based on the incidence and spectrum of mutated genes, and another paper reported an EMT expression signature defined subgroup. We performed a prior free analysis of CRC heterogeneity on 1113 CRC gene expression profiles and confronted our findings to established molecular determinants and clinical, histopathological and survival data. Unsupervised clustering based on gene modules allowed us to distinguish at least five different gene expression CRC subtypes, which we call surface crypt-like, lower crypt-like, CIMP-H-like, mesenchymal and mixed. A gene set enrichment analysis combined with literature search of gene module members identified distinct biological motifs in different subtypes. The subtypes, which were not derived based on outcome, nonetheless showed differences in prognosis. Known gene copy number variations and mutations in key cancer-associated genes differed between subtypes, but the subtypes provided molecular information beyond that contained in these variables. Morphological features significantly differed between subtypes. The objective existence of the subtypes and their clinical and molecular characteristics were validated in an independent set of 720 CRC expression profiles. Our subtypes provide a novel perspective on the heterogeneity of CRC. The proposed subtypes should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity. Original microarray data were uploaded to the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) under Accession Nos E-MTAB-990 and E-MTAB-1026.Entities:
Keywords: colorectal cancer; gene expression; histopathology; molecular heterogeneity
Mesh:
Substances:
Year: 2013 PMID: 23836465 PMCID: PMC3840702 DOI: 10.1002/path.4212
Source DB: PubMed Journal: J Pathol ISSN: 0022-3417 Impact factor: 7.996
Figure 4Morphological CRC patterns. (A) morphological CRC patterns scored in subtypes. (B, C) Distribution of dominant (B) and secondary (C) histological patterns in subtypes. Columns represent subtypes and widths are proportional to subtype frequency (numbers of samples in each subtype); rows represent dominant (B) or secondary (C) patterns and heights are proportional to pattern frequency. Boxes show adjusted p values of pairwise statistical testing of morphological pattern distribution between subtypes.
Figure 1Meta-gene expression pattern in subtypes, connected with prognostic effect of subtypes and meta-genes, in the discovery set. (A) Two heat maps clustering normal (left) and CRC (right) samples (columns) and meta-genes (rows). Colours represent decreased (blue) or increased (red) meta-gene expression relative to their medians. Normal samples were clustered independently on meta-genes centred to CRC meta-gene medians. For comparative purposes, ordering of meta-genes in normal samples is imposed to correspond to that of CRC samples. White horizontal lines denote eight unsupervised clusters of meta-genes, each assigned a colour bar on the left; meta-genes not belonging to a cluster have no colour bar. Names of the meta-genes corresponding to gene modules with gene–gene correlations in normal samples comparable to those in cancer samples are marked red (see Supplementary material, Figure S1D). (B) Effect of inter-quartile range (IQR) standardized expression of meta-genes on RFS, OS and SAR. Points represent estimated hazard ratio (HR), bars represent 95% CI. Bold lines represent effects significant at 5% without adjustment for multiple hypothesis testing; red lines represent effects significant at FDR < 10%; details are provided in Table S6 (see Supplementary material). (C) Kaplan–Meier plots for RFS, OS and SAR, with HR for significant pairwise comparisons (p values adjusted for FDR). Numbers below x axes represent number of patients at risk at selected time points.
Biological identification of gene modules
| Cluster name | Number of genes | Pathway analysis result (number of overlapping genes, | Selected genes |
|---|---|---|---|
| 1. GDC | 27 | Genes involved in differentiation of colon crypt and/or whose expression was reported to be affected in colorectal cancer and/or with prognostic effect in CRC | Intestinal differentiation genes: |
| 2. Chromosome 20q genes | 33 | Chromosome 20 (26 genes, 9.2E-34) | Other, non-20q genes: |
| 3. Proliferation | 83 | Cell cycle (36 genes, 3.0E-33) Mitosis (26 genes, 1.4E-29) Chromosome (26 genes, 2.5E-17) DNA metabolic process (20 genes, 4.9E-10) Lipid synthesis (4 genes, 5.0E-2) | Mitotic checkpoint kinases: |
| 4. Colon crypt markers (secretory cells) | 16 | ||
| 5. EMT/stroma | 310 | Extracellular region part (90 genes) 2.7E-36 Cell adhesion (57 genes) 1.2E-17 Extracellular matrix (44 genes) 5.3E-30 Collagen (16 genes) 1.2E-15 EGF-like domain (26 genes) 1.6E-12 Cell motion (33 genes) 7.2E-8 Blood vessel development (25 genes) 1.1E-8 Growth factor binding (6 genes) 6.0E-5 Frizzled related (5 genes) 6.7E-3 Cell junction organization (7 genes) 1.8E-2 WNT receptor signalling pathway (8 genes) 1.4E-1 | Inhibitors of |
| 6. Unidentified | 14 | ||
| 7 and 8. Immune response | 103 | Immune response (42 genes) 2.0E-28 Positive regulation of immune system process (16 genes) 4.0E-9 Antigen processing and presentation via MHC class II (6 genes) 7.5E-5 Defence response (31 genes) 3.3E-17 Chemokine signalling pathway (9 genes) 2.2E-3 Lymphocyte activation (11 genes) 2.1E-5 Regulation of programmed cell death (14 genes) 2.1E-2 | Cytokines: |
| Meta-gene 105 | 6 | Top of the crypt genes | |
| Meta-gene 144 | 5 | Enterocytes, goblet cells markers | |
| Meta-gene 81 | 7 | Chromosome X (7 genes) 1.1E-8 | |
| Meta-gene 97 | 6 | Chromosome 20p (5 genes) 5.0E-11 | |
| Meta-gene 84 | 7 | Chromosome 8 (7 genes) 5.4E-9 | |
| Meta-gene 141 | 5 | EREG | |
| Meta-gene 112 | 6 | Lipid synthesis (4 genes) 5.0E-2 | |
| Meta-gene 95 | 6 | Homeobox genes | |
| Meta-gene 124 | 5 | Metallothioneins | |
| Meta-gene 131 | 5 | Disulphide bonds (5 genes) 1.7E-02 | |
| Meta-gene 143 | 5 | Unidentified | |
| Meta-gene 80 | 7 | Regulation of RNA metabolic process (6 genes) 4.9E-2 | |
| Meta-gene 71 | 8 | Gut development (3 genes) 3.5E-2 | |
Subtype-specific minimal gene set as identified by Elastic net
| Minimal gene sets specifying a subtype | |||
|---|---|---|---|
| Subtype | Up-regulated from population mean | Down-regulated from population mean | |
| A. Surface crypt-like | |||
| B. Lower crypt-like | |||
| C. CIMP-H-like | |||
| D. Mesenchymal | |||
| E. Mixed | |||
Result of additive multivariate Cox proportional hazards model, with subtype, BRAF mutation, MSI and stagea
| Variable | RFS HR | OS HR | SAR HR | |||
|---|---|---|---|---|---|---|
| A | 0.906 | 0.760 | 1.381 | 0.390 | 1.726 | 0.180 |
| C | 0.940 | 0.850 | 1.560 | 0.220 | 3.675 | 0.0022 |
| D | 1.688 | 0.0055 | 2.161 | 0.0011 | 1.906 | 0.014 |
| E | 1.506 | 0.210 | 2.201 | 0.035 | 2.046 | 0.075 |
| 1.633 | 0.085 | 2.472 | 0.0034 | 3.361 | 0.00072 | |
| MSI | 0.478 | 0.044 | 0.275 | 0.004 | 0.356 | 0.036 |
| Stage 3 | 0.770 | 0.190 | 0.943 | 0.820 | 1.780 | 0.062 |
Baseline is subtype B, MSS, BRAF wt and Stage 2.
ariables significant in the model.
Hazard ratios (HR) for relapse-free survival (RFS), overall survival (OS) and survival after relapse (SAR).
Figure 2Subtypes and biological motifs. Subtype-specific fingerprints of biological motifs, represented either as mean values of gene set enrichment scores of gene sets from corresponding gene modules (EMT/stroma, immune, secretory cells, proliferation, GDC, chromosome 20q, top of the crypt—meta105 and meta144) or composed gene set enrichment scores of particular signatures (canonical Wnt targets, CSC-TopGFP, CSC-EphB2, colon crypt bottom and CIMP-H). The gene set enrichment scores represent whether the genes from the gene set show statistically significant enrichment between the down-regulated (negative scores, light blue area) or up regulated (positive scores) genes of a given subtype; details of score calculation can be found in the Supplementary material (Supplementary methods and results and Table S7.).
Figure 3Clinical and mutational characterization of subtypes. Columns represent variables and rows subtypes. Horizontal bar plots represent proportions of the corresponding variable in each of the subtypes and non-core samples. Non-core samples were tested as one group to ensure that they did not share a common characteristic that would set them apart. Numbers in brackets adjacent to subtype name represent overall number of samples in the subtype. Under the title of each variable we denote the percentage representing baseline proportion in the population, with available information, and N denotes the number of patients for which the information on the respective feature was available. Bars in red represent significant enrichment and bars in blue significant depletion of a feature in the subtype in comparison to baseline, at the 5% significance level. Adjacent to each bar is the percentage of samples in the subtype with the specific feature and in brackets the overall number of samples in the subtype with the information available. We can read that, for instance, subtype C, comprising 154 samples, is enriched for microsatellite-unstable (MSI) tumours, where 60.4% of 91 samples with available information are MSI.
Summary of subtype characteristics
| Subtype | CRC markers and mutations | Histopathology | IHC | Median survival (months) | Clinical | Gene expression | |||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MSI | Dominant | Nuclear | OS | RFS | SAR | Site | Grade | Up-regulated | Down-regulated | ||||||||||||||||||||||||||||||||||
| A: Surface crypt-like | – | + | Papillary or serrated | – | NA | NA | 28.9 | Top colon crypt, secretory cell, metallothioneins | EMT/stroma, Wnt, CSC, Chr20q, proliferation | ||||||||||||||||||||||||||||||||||
| B: Lower crypt-like | – | – | Complex tubular | + | NA | NA | 50.4 | Left | 2 | Top colon crypt, proliferation, Wnt | EMT/stroma, immune, secretory cell | ||||||||||||||||||||||||||||||||
| C: CIMP-H-like | + | + | – | Solid/trabecular or mucinous | – | NA | NA | 6.9 | Right | 3 | Proliferation, immune, metallothioneins | GDC, top colon crypt, Chr20q | |||||||||||||||||||||||||||||||
| D: Mesenchymal | Desmoplastic | – | NA | 79.5 | 19.8 | EMT/stroma, CSC, immune | Proliferation, secretory cell, top colon crypt, GDC, Wnt, Chr20q | ||||||||||||||||||||||||||||||||||||
| E: Mixed | – | – | + | Complex tubular | + | NA | NA | 19.6 | Left | EMT/stroma, immune, top colon crypt, Chr20q, GDC, CSC | Secretory cell | ||||||||||||||||||||||||||||||||
+, significantly enriched; –, significantly depleted; IF, invasion front; NA, not attained; no value, no significant enrichment in comparison to population baseline.