| Literature DB >> 29675884 |
Shumei Zhang1, Yihan Wang1, Yue Gu1, Jiang Zhu1, Ce Ci1, Zhongfu Guo2, Chuangeng Chen1, Yanjun Wei1, Wenhua Lv1, Hongbo Liu1, Dongwei Zhang2, Yan Zhang1.
Abstract
Tumour heterogeneity is an obstacle to effective breast cancer diagnosis and therapy. DNA methylation is an important regulator of gene expression, thus characterizing tumour heterogeneity by epigenetic features can be clinically informative. In this study, we explored specific prognosis-subtypes based on DNA methylation status using 669 breast cancers from the TCGA database. Nine subgroups were distinguished by consensus clustering using 3869 CpGs that significantly influenced survival. The specific DNA methylation patterns were reflected by different races, ages, tumour stages, receptor status, histological types, metastasis status and prognosis. Compared with the PAM50 subtypes, which use gene expression clustering, DNA methylation subtypes were more elaborate and classified the Basal-like subtype into two different prognosis-subgroups. Additionally, 1252 CpGs (corresponding to 888 genes) were identified as specific hyper/hypomethylation sites for each specific subgroup. Finally, a prognosis model based on Bayesian network classification was constructed and used to classify the test set into DNA methylation subgroups, which corresponded to the classification results of the train set. These specific classifications by DNA methylation can explain the heterogeneity of previous molecular subgroups in breast cancer and will help in the development of personalized treatments for the new specific subtypes.Entities:
Keywords: DNA methylation; breast cancer; consensus clustering; molecular subtypes
Mesh:
Year: 2018 PMID: 29675884 PMCID: PMC6026876 DOI: 10.1002/1878-0261.12309
Source DB: PubMed Journal: Mol Oncol ISSN: 1574-7891 Impact factor: 6.603
Figure 1Criteria for the selection of the number of categories. (A) Delta area curve of consensus clustering, indicating the relative change in area under the cumulative distribution function (CDF) curve for each category number k compared with k – 1. The horizontal axis represents the category number k and the vertical axis represents the relative change in area under CDF curve. (B) The average cluster consensus and coefficient of variation among clusters for each category number k. The blue line represents the average cluster consensus and the red line represents the coefficient of variation among clusters.
Figure 2Consensus matrix for DNA methylation classification with the corresponding heat map. (A) The colour‐coded heatmap corresponding to the consensus matrix for k = 10 obtained by applying consensus clustering. The colour gradients were from 0 to 1, representing the degree of consensus, with white corresponding to 0 and dark blue to 1. (B) The heatmap corresponding to the dendrogram in (A) which was generated using the pheatmap function with DNA methylation classification, PAM50 classification, estrogen receptor, progesterone receptor, HER2 receptor status, TNM stage, clinicopathological stage and histological type as the annotations.
Figure 3Survival curves of DNA methylation subtypes and the comparison of lymphocyte infiltration between DNA methylation clusters and their PAM50 classifications. (A) The survival curves of DNA methylation subtypes in train set. The horizontal axis represents the survival time (months), and the vertical axis the probability of survival. The numbers in parentheses in the legend represent the number of samples in each cluster . The log‐rank test was used to assess the statistical significance of the difference. (B) Lymphocyte infiltration score distributions of nine DNA methylation clusters in the train set. The horizontal axis represents the DNA methylation clustering. (C) PAM50 subtypes with enrichment in each DNA methylation cluster. (D) The reverse orientation of (C).
The results of chi‐square test on the global level
| Clinical attributes | Subclasses |
|
|---|---|---|
| Age | Young | 0.0433 |
| Old | ||
| Race | White | 0.0054 |
| Asian | ||
| Black or African American | ||
| American Indian or Alaska Native | ||
| N stage | N0 | 0.0045 |
| N1 | ||
| N2 | ||
| N3 | ||
| M stage | M0 | 0.0173 |
| M1 | ||
| Stage | Stage I | 0.0009 |
| Stage II | ||
| Stage III | ||
| Stage IV | ||
| ER | Negative | 1.0232e‐32 |
| Positive | ||
| PR | Negative | 2.0291e‐25 |
| Positive | ||
| HER2 | Negative | 0.0103 |
| Positive | ||
| Histological type | Infiltrating ductal carcinoma | 0.0003 |
| Infiltrating lobular carcinoma | ||
| Medullary carcinoma | ||
| Mucinous carcinoma | ||
| Infiltrating carcinoma NOS | ||
| Metastatic | Yes | 0.0117 |
| No |
P‐value: the P‐value of chi‐square test.
Figure 4Specific hyper/hypomethylation CpG sites for each DNA methylation cluster. (A) Display of specific CpG sites for each DNA methylation prognosis subtype. The red bars and blue bars represent hypermethylation CpG sites and hypomethylation CpG sites, respectively. (B) The heat map for the specific sites in nine DNA methylation clusters.
The numbers of specific CpGs for clustering
| Cluster | Number of specific CpGs |
|---|---|
| Cluster 1 | 87 |
| Cluster 2 | 200 |
| Cluster 3 | 15 |
| Cluster 4 | 177 |
| Cluster 5 | 13 |
| Cluster 6 | 44 |
| Cluster 7 | 519 |
| Cluster 8 | 53 |
| Cluster 9 | 144 |
The confusion matrix of Bayesian network classification. Each row of the matrix represents the instances in a predicted cluster, and each column represents the instances in an actual cluster. C1–C9 are the logograms for clusters 1–9
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | |
|---|---|---|---|---|---|---|---|---|---|
| C1 |
| 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
| C2 | 2 |
| 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| C3 | 0 | 0 |
| 3 | 3 | 0 | 1 | 1 | 0 |
| C4 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 |
| C5 | 2 | 4 | 2 | 0 |
| 0 | 0 | 2 | 0 |
| C6 | 1 | 1 | 0 | 0 | 0 |
| 0 | 0 | 0 |
| C7 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 |
| C8 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0 |
| C9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
|
The bold text in the table represent the numbers of instances in each class that have the same prediction cluster and actual cluster.
Figure 5The prognosis model and prediction results. (A) The ROC curve displayed the sensitivity and specificity of the prognosis model. The area under the curve (AUC) reached 0.946. (B) Survival curves of nine clusters predicted from the test set using the prognosis model. The numbers in parentheses in the legend represent the number of samples in each cluster. The log‐rank test was used to assess the statistical significance of the difference. (C) PAM50 subtypes with enrichment in each DNA methylation cluster. (D) Lymphocyte infiltration score distributions of DNA methylation clusters in the test set.