| Literature DB >> 27386846 |
Dvir Netanely1, Ayelet Avraham2, Adit Ben-Baruch3, Ella Evron2, Ron Shamir4.
Abstract
BACKGROUND: Breast cancer is a heterogeneous disease comprising several biologically different types, exhibiting diverse responses to treatment. In the past years, gene expression profiling has led to definition of several "intrinsic subtypes" of breast cancer (basal-like, HER2-enriched, luminal-A, luminal-B and normal-like), and microarray based predictors such as PAM50 have been developed. Despite their advantage over traditional histopathological classification, precise identification of breast cancer subtypes, especially within the largest and highly variable luminal-A class, remains a challenge. In this study, we revisited the molecular classification of breast tumors using both expression and methylation data obtained from The Cancer Genome Atlas (TCGA).Entities:
Keywords: Breast cancer subtypes; Clustering; DNA methylation; Luminal-A; RNA-Seq; Unsupervised analysis
Mesh:
Year: 2016 PMID: 27386846 PMCID: PMC4936004 DOI: 10.1186/s13058-016-0724-2
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
Fig. 1Global unsupervised clustering of 1148 breast samples using RNA-Seq data. Applying the K-Means algorithm using K = 5 on the RNA-Seq dataset yielded a partition exhibiting moderate agreement with PAM50 labels and the three immunohistochemical markers. Notably, luminal-A samples were split between a rather homogenous cluster 2 and cluster 1, which is composed of a mix of luminal-A and luminal-B. a K-Means clusters. b PAM50 calls. c Estrogen receptor (ER) status. d Progesterone receptor (PR) status. e Human epidermal growth factor receptor 2 (HER2) status
Fig. 2Unsupervised analysis of luminal breast samples using RNA-Seq data. a Applying the K-Means algorithm on the 737 luminal samples using K = 2 splits the samples into two subgroups exhibiting better five-year prognostic value than the PAM50 luminal-A/luminal-B partition. b Five-year survival and recurrence for the two luminal breast cancer partitions. The partition into two RNA-Seq-based clusters outperforms PAM50 partition of the luminal samples in both survival and recurrence. P values were calculated using the log-rank test
Fig. 3Unsupervised analysis of luminal-A (LumA) breast samples. a Clustering of 534 RNA-Seq profiles partitions the data into two groups exhibiting distinct expression profiles. The clusters also show significant enrichment for clinical variables including recurrence, proliferation score, age, and histology. The bars below the heatmap show, from top to bottom, the partition of the samples, the designation of the samples according to the clustering of all luminal samples (see Fig. 2), histological type, and proliferation scores. b Five-year survival and recurrence analysis in the two luminal-A subgroups. LumA-R2 samples exhibit significantly reduced five-year recurrence rate compared with LumA-R1
The main characteristics distinguishing between the luminal-A subgroups, LumA-R1 and LumA-R2
| Group characteristic | LumA-R1 | LumA-R2 |
|
|---|---|---|---|
| Recurrence-free survival | Increased recurrence | Reduced recurrence | 7.6e-3 |
| Histological type | Ductal ( | Lobular ( | |
| Age, years, average | 61.5 | 57.4 | 2.6e-05 |
| Proliferation score | -0.4 | -0.6 | 8.9e-25 |
| Tumor nuclei percent | 80 % | 73 % | 2.6e-12 |
| Normal cell percent | 2.9 % | 6.1 % | 2.8e-08 |
| Gene overexpression | 194 | 1068 |
Average values are shown for each group where relevant. Gene overexpression is computed with respect to the 2000 genes used for clustering.
The most enriched functional categories among the 1000 genes most differentially expressed between LumA-R1 and LumA-R2 samples
| Enrichment type | Term | Number of genes |
|
|---|---|---|---|
| Gene Ontology | Regulation of immune system process | 152 | 3.74e-50 |
| Immune system process | 201 | 3.65e-47 | |
| Regulation of leukocyte activation | 71 | 2.37e-28 | |
| Regulation of multicellular organismal process | 183 | 2.89e-28 | |
| Cell activation | 91 | 4.59e-28 | |
| Regulation of response to external | 73 | 8.18e-27 | |
| Regulation of biological quality | 218 | 1.82e-26 | |
| Leukocyte activation | 67 | 1.95e-26 | |
| Positive regulation of cell activation | 56 | 5.13e-24 | |
| T cell activation | 45 | 4.93e-22 | |
| Regulation of cell proliferation | 128 | 1.83e-21 | |
| KEGG Pathways | Cytokine-cytokine receptor interaction | 56 | 4.76e-22 |
| Hematopoietic cell lineage | 29 | 1.50e-17 | |
| Cell adhesion molecules (CAMs) | 30 | 4.08e-13 | |
| Primary immunodeficiency | 16 | 8.70e-13 | |
| Chemokine signaling pathway | 31 | 1.14e-09 | |
| Complement and coagulation cascades | 17 | 1.36e-08 | |
| T cell receptor signaling pathway | 20 | 1.30e-07 | |
| Allograft rejection | 11 | 6.44e-07 | |
| Natural killer cell mediated cytotoxicity | 20 | 5.66e-06 | |
| Pathways in cancer | 34 | 1.49e-05 | |
| Wiki-Pathways | TCR signaling pathway | 10 | 1.55e-09 |
| B cell receptor signaling pathway | 10 | 1.72e-06 | |
| Focal adhesion | 11 | 5.88e-05 | |
| Complement activation, classical pathway | 6 | 8.38e-05 | |
| Chromosomal location | 11q23 | 18 | 1.84e-05 |
| Xq23 | 8 | 4.99e-05 |
All the genes on the list showed significantly higher expression on the LumA-R2 samples compared to LumA-R1 samples
Fig. 4LumA-R2 samples overexpress genes in the T cell receptor signaling pathway. The list of top 1000 genes differentially expressed in LumA-R1 and LumA-R2 samples was found to be significantly enriched for the pathway genes (p = 1.3e-07). Genes marked in red are overexpressed in LumA-R2 samples. Pathway and graphics were taken from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database
Fig. 5Unsupervised analysis of breast cancer tumors using DNA methylation data. Samples were clustered by K-Means based on correlation using the top 2000 variable CpGs over each sample subset. a All 679 tumors. b The 579 samples identified as luminal-A and luminal-B by PAM50 classification. c The 378 luminal A samples only. First bar below each expression matrix shows the assignment of the samples to methylation-based clusters. Second bar (a and b) shows PAM50 calls for the samples. Second bar (c) presents the RNA-Seq based LumA-R1/2 subgroups defined in Figure 3. Right panels show five-year Kaplan-Meier survival plots for the resulting groups
Gene enrichment in the three subsets of CpGs exhibiting differential methylation between the LumA-M1 and LumA-M3 subgroups
| (1) | (2) | (3) | ||||
|---|---|---|---|---|---|---|
| Hyper-methylated CpGs | Negative: | Positive: | ||||
| 1000 CpGs, 483 genes | 586 CpGs, 340 genes | 212 CpGs, 125 genes | ||||
| Term |
| Term |
| Term |
| |
| Gene Ontology | Anatomical structure development | 6.1e-28 | Developmental process | 7.8e-06 | Pattern specification process | 1.1e-13 |
| Developmental process | 2.0e-25 | Single organism signaling | 2.4e-05 | Regionalization | 1.1e-12 | |
| Multicellular organismal process | 9.6e-24 | Signaling | 1.8e-05 | Anatomical structure development | 2.2e-11 | |
| Single multicellular organism process | 1.6e-22 | Cellular developmental process | 1.4e-05 | Single organism developmental process | 1.9e-11 | |
| Single organism signaling | 1.7e-21 | Single organism developmental process | 2.3e-05 | Anatomical structure morphogenesis | 1.8e-11 | |
| Signaling | 1.9e-21 | Anatomical structure development | 8.0e-05 | Developmental process | 1.7e-11 | |
| Cell-cell signaling | 1.7e-21 | Cell-cell signaling | 1.8e-04 | Embryonic morphogenesis | 1.1e-10 | |
| Neuron differentiation | 1.2e-20 | Cell differentiation | 2.2e-04 | Cellular developmental process | 1.8e-10 | |
| Single organism developmental process | 1.4e-19 | Synaptic transmission | 4.4e-04 | Organ development | 5.3e-10 | |
| Regulation of transcription from RNA polymerase II promoter | 1.2e-16 | Anatomical structure morphogenesis | 6.1e-04 | Single multicellular organism process | 5.6e-10 | |
| InterPro | Homeobox | 3.6e-35 | Homeobox | 1.1e-04 | Homeobox | 2.1e-31 |
| Tumor suppressor genes (TSGene 2.0) | AHRR, AKR1B1, BMP2, C2orf40, CDH4, CDO1, CDX2, CNTNAP2, CSMD1, DLK1, DSC3, EBF3, EDNRB, FAT4, FOXA2, FOXC1, GALR1, GREM1, GRIN2A, ID4, IRF4, IRX1, LHX4, MAL, MIR124-2, MIR124-3, MIR125B1, MIR129-2, MIR137, MIR9-3, ONECUT1, OPCML, PAX5, PAX6, PCDH8, PHOX2A, PRKCB, PROX1, PTGDR, RASL10B, SFRP1, SFRP2, SHISA3, SLIT2, SOX7, TBX5, UNC5D, ZIC1 | 1.5e-03 | AKR1B1, ASCL1, BIN1, BMP4, CCDC67, CDK6, CDO1, EBF3, GSTP1, ID4, IRX1, L3MBTL4, LRRC4, MAP4K1, MME, NTRK3, PCDH10, PDLIM4, PROX1, PTGDR, RUNX3, SCGB3A1, SFRP1, SLC5A8, SLIT2, UBE2QL1, UNC5B, VIM, WT1 | 9.7e-02 | AMH, GATA4, HOPX, HOXB13, LHX4, LHX6, MAP4K1, ONECUT1, PAX5, RASAL1, TBX5, TP73, WT1, ZIC1 | 5.5e-02 |
| (48 genes) | (29 genes) | (14 genes) | ||||
Feature enrichment in the three subsets of differentially methylated CpGs in LumA-M1 and LumA-M3 subgroups
CpG enrichment tests show that hyper-methylated CpGs negatively correlated with gene expression are enriched for upstream gene parts, whereas positively correlated CpGs are enriched for the gene body. All three hyper-methylated CpG groups are enriched for informatically determined enhancer elements and experimentally determined differentially methylated regions and DNAse hypersensitive sites. The p values represent hyper-geometric-based over-representation or under-representation and are FDR corrected (significant p values are marked in bold). UTR untranslated region, DMR differentially methylated region
Multivariate Cox analysis of luminal-A subgroups for five-year survival and five-year recurrence
| Survival | Recurrence | |||
|---|---|---|---|---|
| Variable | Hazard ratio |
| Hazard ratio |
|
| LumA-R (1 vs 2) | 0.56 | 0.36991 |
|
|
| LumA-M (2, 3 vs 1) |
|
| 3.04 | 0.07028 |
| Age (<60 vs > =60 years) |
|
| 1.03 | 0.96530 |
| Pathologic stage (I, II vs. III, IV) | 2.12 | 0.25519 | 1.93 | 0.26992 |
| ER status | 7.17 | 0.18095 | 0.00 | 0.99575 |
| PR status | 0.47 | 0.50039 | 0.29 | 0.29092 |
| Her2 status | 1.48 | 0.72659 | 0.64 | 0.68789 |
Significant p values are marked in boldface. ER estrogen receptor, PR progesterone receptor, Her2 human epidermal growth factor receptor 2