Liwei Meng1, Yingchun Xu1, Chaoyang Xu1, Wei Zhang1. 1. Department of Breast and Thyroid Surgery, Shaoxing People's Hospital, Shaoxing Hospital of Zhejiang University, Shaoxing, Zhejiang, People's Republic of China.
Abstract
PURPOSE: Breast cancer is the leading cause of cancer death worldwide in women. The molecular mechanism for human breast cancer is unknown. Gene microarray has been widely used in breast cancer research to identify clinically relevant molecular subtypes as well as to predict prognosis survival. So far, the valuable multigene signatures in clinical practice are unclear, and the biological importance of individual genes is difficult to detect, as the described signatures virtually do not overlap. Early prognosis of this disease, breast invasive ductal carcinoma (IDC) and breast ductal carcinoma in situ (DCIS), is vital in breast surgery. METHODS: Thus, this study reports gene expression profiling in large breast cancer cohorts from Gene Expression Omnibus, including GSE29044 (N=138) and GSE10780 (N=185) test series and four independent validation series GSE21653 (N=266), GSE20685 (N=327), GSE26971 (N=276), and GSE12776 (N=204). Significantly differentially expressed genes in human breast IDC and breast DCIS were detected by transcriptome microarray analysis. RESULTS: We created a set of three genes (MAMDC2, TSHZ2, and CLDN11) that were significantly correlated with disease-free survival of breast cancer patients using a univariate Cox regression model (significance level P<0.01) in a meta-analysis. Based on the risk score of the three genes, the test series patients could be separated into low-risk and high-risk groups with significantly different survival times. This signature was validated in the other three cohorts. The prognostic value of this three-gene signature was confirmed in the internal validation series and another four independent breast cancer data sets. The prognostic impact of one of the three genes, CLDN11, was confirmed by immunohistochemistry. CLDN11 was significantly overexpressed in human breast IDC as compared with normal breast tissues and breast DCIS. CONCLUSION: Using novel gene expression profiling together with a meta-analysis validation approach, we have identified a three-gene signature with independent prognostic impact. Furthermore, CLDN11 may offer a biomarker to predict prognosis as well as a new target for prognostic and therapeutic intervention for human breast IDC.
PURPOSE:Breast cancer is the leading cause of cancer death worldwide in women. The molecular mechanism for humanbreast cancer is unknown. Gene microarray has been widely used in breast cancer research to identify clinically relevant molecular subtypes as well as to predict prognosis survival. So far, the valuable multigene signatures in clinical practice are unclear, and the biological importance of individual genes is difficult to detect, as the described signatures virtually do not overlap. Early prognosis of this disease, breast invasive ductal carcinoma (IDC) and breast ductal carcinoma in situ (DCIS), is vital in breast surgery. METHODS: Thus, this study reports gene expression profiling in large breast cancer cohorts from Gene Expression Omnibus, including GSE29044 (N=138) and GSE10780 (N=185) test series and four independent validation series GSE21653 (N=266), GSE20685 (N=327), GSE26971 (N=276), and GSE12776 (N=204). Significantly differentially expressed genes in human breast IDC and breast DCIS were detected by transcriptome microarray analysis. RESULTS: We created a set of three genes (MAMDC2, TSHZ2, and CLDN11) that were significantly correlated with disease-free survival of breast cancerpatients using a univariate Cox regression model (significance level P<0.01) in a meta-analysis. Based on the risk score of the three genes, the test series patients could be separated into low-risk and high-risk groups with significantly different survival times. This signature was validated in the other three cohorts. The prognostic value of this three-gene signature was confirmed in the internal validation series and another four independent breast cancer data sets. The prognostic impact of one of the three genes, CLDN11, was confirmed by immunohistochemistry. CLDN11 was significantly overexpressed in human breast IDC as compared with normal breast tissues and breast DCIS. CONCLUSION: Using novel gene expression profiling together with a meta-analysis validation approach, we have identified a three-gene signature with independent prognostic impact. Furthermore, CLDN11 may offer a biomarker to predict prognosis as well as a new target for prognostic and therapeutic intervention for human breast IDC.
Breast cancer is the leading cause of cancer-related death in women worldwide. Although early-stage breast cancerpatients are treated by surgery, the risk of recurrence is very high. Breast cancer type and cancer grade are two of the most vital characteristics, and they are the best-established prognostic factors in breast cancer.1–5Approximately 45%–78% of all invasive breast cancers are associated with ductal carcinoma in situ (DCIS).6 Invasive ductal carcinoma (IDC) always requires radical treatment, chemotherapy, and radiotherapy, but conservative treatment is usually sufficient for DCIS. The American Joint Committee on CancerTNM staging system is currently the only prognostic classification used in clinical practice to select patients for adjuvant chemotherapy.7–12 However, the TNM staging system fails to predict recurrence accurately in many patients undergoing curative surgery for breast cancer. Microarray-based gene expression profiling has been successfully used in clinical cancer research to subclassify cancer, to predict prognosis, or to evaluate the response to therapy.13–16Several studies have exploited microarray technology to investigate gene expression profiles in breast cancer, but only a small subset demonstrates clear prognostic significance. Major genes have been discovered to identify the molecular subtypes that predict the prognosis and response to additional therapy. So far, only BRCA1/2 mutation analysis has been used in clinical practice as a predictive marker for breast cancer.17–21 In this study, we evaluated the expression levels of single genes for prognostic relevance as well as to find new gene signatures. Therefore, we initially mined previously published gene expression microarray data from the Gene Expression Omnibus (GEO). We identified a prognostic, three-gene signature from the GSE29044 and GSE10780 test series patients and another four independent GEO cohorts using the sample-splitting method and Cox regression analysis. We created a set of three genes (MAMDC2, TSHZ2, and CLDN11) that were significantly correlated with disease-free survival (DFS) of breast cancerpatients using a univariate Cox regression model (significance level P<0.001) in a meta-analysis. Based on the risk score of three genes, the test series patients could be separated into low-risk and high-risk groups with significantly different survival times. This signature was validated in the other three cohorts. The prognostic value of this three-gene (MAMDC2, TSHZ2, and CLDN11) signature was confirmed in the internal validation series and another three independent breast cancer sets. Gene set enrichment analysis (GSEA) suggested that risk score positively correlated with several cancer metastases-related pathways. Using a novel gene expression profiling together with a meta-analysis validation approach, we have identified a three-gene signature with independent prognostic impact.
Materials and methods
Microarray data analysis of breast cancer
Affymetrix Human Genome U133plus2 Array for breast cancer gene expression microarray and corresponding clinical data used in this study were downloaded from the publicly available GEO databases (http://www.ncbi.nlm.nih.gov/geo/): GSE29044 and GSE10780. GSE29044 consisted of 73 breast cancer and 36 adjacent disease-free tissues. GSE10780 consisted of 143 histologically normal breast tissues and 42 IDC tissues. The data of CEL files were downloaded from GEO database and normalized at transcript and gene level using the Robust Multichip Average method.22 Expression Console 1.4.1 and Transcriptome Analysis Console v3.0 were used for microarray analysis.
Identification of prognostic biomarker and cluster analysis
Cluster analysis was performed using cluster 3.0 as described. The main principle behind the method is to collect a set of items (genes or arrays) into a tree, where items are joined by very short branches if they are very similar to each other, and by increasingly longer branches as their similarity decreases. We identified a set of genes that were significantly correlated with DFS of breast cancerpatients (P<0.001 from the expression data to univariable Cox proportional hazards regression analysis using the BRB-Array Tools).23
Gene ontology and coexpression network analysis
All genes falling into significant temporal profiles (P<0.001), amounting to a total of 686 genes, underwent a vital function classification using the Gene Ontology (GO) analysis (NCBI). All the GO terms assigned to these genes were examined and obtained by Fisher’s exact test and χ2 test for calculating the level of significance as described.24,25 We present gene coexpression networks to identify interactions among genes. Based on the correlation between genes, the gene–gene interaction network was constructed, as described. To make a visual representation, only the strongest correlations were drawn in these renderings. Within the gene coexpression network, nodes simulate genes, and the edges between them simulate the interaction between them. Within the network analysis, a degree is the simplest, most important measure of the centrality of a gene within a network and determines the relative importance.
Gene set enrichment analysis
GSEA was performed by the JAVA program using MSigDB C2 CP: canonical pathways gene set collection.26 The GSEA, visualized in Cytoscape and the Enrichment Map software, was used to determine if the members of a given gene set were generally associated with risk score and was therefore performed on all mRNA genes on the Affymetrix Human Genome U133 Plus 2.0 ranked by enrichment score from most negative to most positive. A total of 1,000 random sample permutations was carried out, and the significance threshold set at false discovery rate (FDR) <0.01. If a gene set had a positive enrichment score, the majority of its members had higher expression accompanied with higher risk score, and the set was termed “enriched”.
Immunohistochemistry
Tissues for immunohistochemistry containing normal breast, breast IDC, and breast DCIS were routinely deparaffinized and rehydrated, and then were subjected to heat-induced epitope retrieval in 0.01 mM citrate buffer (pH 6.0). The slides were then incubated with Rabbit anti-Claudin 11 polyclonal antibody (1:200) at 4°C overnight. Sections were then stained with 3,3′-diaminobenzidine (Origene, Beijing, People’s Republic of China) for 2 minutes. All sections were counterstained with hematoxylin, dehydrated, and mounted.
Statistical analysis
For microarray analysis, differentially expressed genes were confirmed using a P-value threshold and FDR analysis. The threshold of truly significant genes was taken to be P-value <0.001 and FDR value <0.05. Genes were considered statistically significant if their permutation P-values were ≤0.01. To construct a predictive model, the selected genes were fitted in a multivariable Cox regression model in the test series.
Results
Gene expression profile in human breast cancer
Affymetrix Human Genome U133plus2 Array for breast cancer gene expression microarray and corresponding clinical data used in this study were downloaded from the publicly available GEO databases. Breast cancer gene expression microarrays GSE29044 and GSE10780 were analyzed for potential transcriptome changes in IDC using Expression Console 1.4.1 and Transcriptome Analysis Console v3.0. Hierarchical clustering showed that a total of 1,843 genes were differentially expressed (P<0.01) in IDC compared with normal and breast DCIS as shown in Figure 1.
Figure 1
Gene expression profile of human breast cancer.
Gene ontology and pathway analysis
Significantly differentially expressed genes in IDC were then subjected to GO and pathway analysis. We found that many of these genes were related to cell cycle and carcinogenesis. Using the Kyoto Encyclopedia of Genes and Genomes and Gene Map annotator and Pathway Profiler databases, the significant signaling pathways were categorized into different groups including the PI3K-Akt signaling pathway, focal adhesion, metabolic pathways, Jak-STAT signaling pathway, and VEGF signaling pathway. Important genes and pathways involved in this process are shown in Figure 2A and B.
Figure 2
Gene ontology and pathway analysis of differently expressed gene in breast cancer.
Coexpression network and candidate biomarker
To determine which gene or genes may potentially play as a biomarker in the development of IDC, all significantly differentially expressed genes in IDC were then subjected to a gene coexpression network. In this cancer coexpression network, we found that MAMDC2, TSHZ2, CLDN11, SPRY2, ACTA2, CHRDL1, ABCA8, and so on, play a key role in the development of IDC. Gene networks were constructed as shown in Figure 3. The degree of a node describes the number of links one gene has to others within the gene network. Interestingly, central to this network was a set of three genes (MAMDC2, TSHZ2, and CLDN11), which directly controlled 20 neighboring genes it interacted with (Figure 3).
Figure 3
mRNA–mRNA coexpression network.
Notes: The differential genes were selected as candidate genes as a function of IDC by constructing a gene coexpression network with k-core algorithm. MAMDC2, TSHZ2, and CLDN11 were the key genes in the gene network. Node size represents the degree centrality.
Abbreviation: IDC, invasive ductal carcinoma.
Survival analysis in the expression data
Invariable Cox proportional hazards regression model was performed to identify genes associated with prognostic relevance. We found a set of three genes (MAMDC2, TSHZ2, and CLDN11) that were significantly correlated with IDC patients’ DFS (P<0.001; Table 1). According to the expression of these three genes for prediction, we created a risk-score formula for prediction of breast cancerpatient’s survival. Risk score = (1.05326× expression of MAMDC2) + (1.13029× expression of TSHZ2) + (0.73615× expression of CLDN11). Moreover, we calculated the risk score for each patient in the test series. We found that the median risk score could divide patients into a high-risk group (N=158) or a low-risk group (N=165) in the test series as the cutoff point. The high-risk group had significantly shorter median DFS compared with the low-risk group (log-rank test P<0.01), as shown in Figure 4.
Table 1
Three genes that were significantly associated with survival in the test data set
Gene
Coefficient
Hazard ratio
P-value Cox
P-value permutation
MAMDC2
1.05326
0.48321
3.29E–05
0.000103
TSHZ2
1.13029
0.66413
2.16E–04
0
CLDN11
0.73615
0.76245
1.53E–03
0.000234
Figure 4
Kaplan–Meier estimates of the survival in the GEO test data set.
Note: Kaplan–Meier curves for GSE29044 (N=138) and GSE10780 (N=185) test data set.
Validation for survival prediction in independent data sets
In order to validate risk-score formula, risk scores of four independent publicly available breast cancer gene expression data sets including four independent validation series GSE21653 (N=266), GSE20685 (N=327), GSE26971 (N=276), and GSE12776 (N=204) were calculated. By a set of three genes risk-score formula in the test series as the cutoff point, we divided patients into a high-risk and a low-risk group. Consistent to that described earlier, patients in the high-risk group had significantly shorter median DFS than those in the low-risk group, as shown in Figure 5.
Figure 5
Kaplan–Meier estimates of the survival in the GEO validation data set.
Notes: (A) Kaplan–Meier curves for GSE21653 validation data set (N=266); (B) Kaplan–Meier curves for the entire GSE20685 patients (N=327). (C) Kaplan–Meier curves for GSE26971 patients (N=276); (D) Kaplan–Meier curves for GSE12776 patients (N=204).
Evaluation of CLDN11 expression as a prognostic marker in IDC
First, the prognostic impact of one of three genes, the claudin11 (CLDN11), was confirmed by immunohistochemistry. We demonstrated that CLDN11 is overexpressed in human IDC as compared with normal breast tissue and DCIS. Importantly, we showed CLDN11 as a marked expression increase in IDC compared with lower grade breast cancer, as shown in Figure 6A. We further measured the RNA expression level of CLDN11 by quantitative real-time polymerase chain reaction (qRT-PCR) in a set of 25 matched samples and detected significantly increased levels of CLDN11 mRNA in breast cancer tissues compared to the nontumor tissue (Figure 6B). High CLDN11 expression was also found to correlate with shorter overall survival (P=0.012) of breast cancerpatients (Figure 6C). These results demonstrated that CLDN11 may offer a biomarker to predict prognosis as well as a new target for prognostic and therapeutic intervention for human breast IDC.
Figure 6
CLDN11 is overexpressed in invasive ductal carcinoma.
Notes: (A) Expression of CLDN11 in normal breast, and ductal carcinoma in situ. (B) mRNA expression of CLDN11 in normal tissue and breast cancer tissue was analyzed by quantitative real-time polymerase chain reaction (qRT-PCR). P<0.05 indicate significant differences between two groups. (C) Kaplan–Meier plots of CLDN11 expression in 25 cases of breast cancer patients. Overall survival rate was performed by log-rank test.
Abbreviations: DCIS, ductal carcinoma in situ; IDC, invasive ductal carcinoma.
Discussion
Breast cancer is a multistep and complex disease that has special biological features and clinical behaviors. DCIS is a noninvasive form of breast cancer. Approximately 65% of all invasive breast cancers are associated with DCIS.27–31 What extent DCIS and IDC share low-risk susceptibility gene or whether there are differences in the strength of association for shared gene is not clear. Efforts are focused on characterizing the molecular subtype gene of these two tumors to diagnose them and define new targets to therapy. Hence, early classification of IDC and DCIS is a vital step in breast cancer surgery, especially to determine whether DCIS is associated with tumor cell microinvasion.32Previous studies have exploited microarray technology to investigate gene expression profiles in breast cancer, but only a small subset demonstrates clear prognostic significance. Major genes have been created to identify molecular subtypes that predict prognosis and response to additional therapy. So far, only BRCA1/2 mutation analysis has been used in clinical practice as a predictive marker for breast cancer. In this study, we evaluate the expression levels of single genes for prognostic relevance as well as to find new gene signatures. Therefore, we initially mined previously published gene expression microarray data from the GEO. We identified a prognostic, three-gene signature from the GSE39582 test series patients, and another two independent GEO cohorts using the sample-splitting method and Cox regression analysis. We created a set of three genes that were significantly correlated with breast cancerpatients’ DFS using a univariate Cox regression model (significance level P<0.01) in a meta-analysis. Based on the risk score of the three genes, the test series patients could be separated into low-risk and high-risk groups with significantly different survival times. This signature was validated in the other three cohorts. The prognostic value of this three-gene signature was confirmed in the internal validation series and another three independent breast cancer sets. GSEA suggested that risk score positively correlated with several cancer metastases-related pathways. Using a novel gene expression profiling together with a meta-analysis validation approach, we have identified a set of three-gene signature with independent prognostic impact.In our study, we identified a prognostic, three-gene signature from the GSE29044 and GSE10780 test series patients and another four independent GEO cohorts using the sample-splitting method and Cox regression analysis. We created a set of three genes (MAMDC2, TSHZ2, and CLDN11) that were significantly correlated with DFS of breast cancerpatients using a univariate Cox regression model (significance level P<0.001) in a meta-analysis. Based on the risk score of three genes, the test series patients could be separated into low-risk and high-risk groups with significantly different survival times. This signature was validated in the other three cohorts. The prognostic value of this three-gene (MAMDC2, TSHZ2, and CLDN11) signature was confirmed in the internal validation series and another three independent breast cancer sets. GSEA suggested that risk score positively correlated with several cancer metastases-related pathways. Using a novel gene expression profiling together with a meta-analysis validation approach, we have identified a set of three-gene signature with independent prognostic impact.
Authors: Dunyong Tan; KuanHui E Chen; Changhui Deng; Peizhi Tang; Jianjun Huang; Trina Mansour; Richard A Luben; Ameae M Walker Journal: Cancer Lett Date: 2013-12-30 Impact factor: 8.679
Authors: Erik S Knudsen; Ornella Dervishaj; Celina G Kleer; Thomas Pajak; Gordon F Schwartz; Agnieszka K Witkiewicz Journal: Cell Cycle Date: 2013-06-06 Impact factor: 4.534
Authors: Margaret J Currie; Brooke E Beardsley; Gavin C Harris; Sarah P Gunningham; Gabi U Dachs; Birgit Dijkstra; Helen R Morrin; J Elisabeth Wells; Bridget A Robinson Journal: Hum Pathol Date: 2012-10-01 Impact factor: 3.466
Authors: E A Filippova; I V Pronina; S S Lukina; T P Kazubskaya; E A Braga; A M Burdennyi; V I Loginov Journal: Bull Exp Biol Med Date: 2021-10-27 Impact factor: 0.804