Literature DB >> 28944917

Application of a co‑expression network for the analysis of aggressive and non‑aggressive breast cancer cell lines to predict the clinical outcome of patients.

Ling Guo1, Kun Zhang1, Zhitong Bing2.   

Abstract

Breast cancer metastasis is a demanding problem in clinical treatment of patients with breast cancer. It is necessary to examine the mechanisms of metastasis for developing therapies. Classification of the aggressiveness of breast cancer is an important issue in biological study and for clinical decisions. Although aggressive and non‑aggressive breast cancer cells can be easily distinguished among different cell lines, it is very difficult to distinguish in clinical practice. The aim of the current study was to use the gene expression analysis from breast cancer cell lines to predict clinical outcomes of patients with breast cancer. Weighted gene co‑expression network analysis (WGCNA) is a powerful method to account for correlations between genes and extract co‑expressed modules of genes from large expression datasets. Therefore, WGCNA was applied to explore the differences in sub‑networks between aggressive and non‑aggressive breast cancer cell lines. The greatest difference topological overlap networks in both groups include potential information to understand the mechanisms of aggressiveness. The results show that the blue and red modules were significantly associated with the biological processes of aggressiveness. The sub‑network, which consisted of TMEM47, GJC1, ANXA3, TWIST1 and C19orf33 in the blue module, was associated with an aggressive phenotype. The sub‑network of LOC100653217, CXCL12, SULF1, DOK5 and DKK3 in the red module was associated with a non‑aggressive phenotype. In order to validate the hazard ratio of these genes, the prognostic index was constructed to integrate them and examined using data from the Cancer Genomic Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Patients with breast cancer from TCGA in the high‑risk group had a significantly shorter overall survival time compared with patients in the low‑risk group (hazard ratio=1.231, 95% confidence interval=1.058‑1.433, P=0.0071, by the Wald test). A similar result was produced from the GEO database. The findings may provide a novel strategy for measuring cancer aggressiveness in patients with breast cancer.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28944917      PMCID: PMC5779881          DOI: 10.3892/mmr.2017.7608

Source DB:  PubMed          Journal:  Mol Med Rep        ISSN: 1791-2997            Impact factor:   2.952


Introduction

Breast cancer is the most commonly diagnosed malignant cancer in women. Generally, adjuvant therapy is an effective way to improve patient survival and affect patient quality of life (1). However, drug resistance and metastasis are still important problems during breast cancer therapy. Therefore, uncovering the metastatic molecular mechanisms of breast cancer cells may be useful for breast cancer therapy and is urgently required. Many successful efforts have investigated the metastatic nature of breast cancer through basic research (molecular and genetic analyses), and various novel genes that are involved breast cancer cell metastasis have been identified (2–4). Although individual a gene or protein alone can have an important role in the metastasis of breast cancer cell, determining individual gene expression levels does not facilitate a comprehensive understanding of cancer cell metastasis (5). Weighted gene co-expression network analysis (WGCNA) (6) is a powerful tool to examine the potential gene correlation structures within the gene expression data. The weighted gene co-expression network is an intuitive network concept in which ‘nodes’ represent gene expression vectors over tissues/conditions and ‘edges’ are weighted by correlations (typically the Pearson correlation coefficient) between the connected nodes. WGCNA can be used for identifying modules of highly correlated genes without pre-assigning a ‘hard’ threshold to decide whether an edge should be drawn between two nodes, for summarizing the identified modules by the module eigengene, for relating eigengene network to one another and to external sample traits, and for calculating module membership measures (7). WGCNA has been successfully applied in various types of cancer, including glioblastoma (8), breast cancer (9), prostate cancer (10) and lung cancer (11). In breast cancer, Presson et al (9) applied WGCNA to investigate the relationship between tissue microarray data and clinic traits in 2011. The study identified a rule for predicting survival outcome of patients with breast cancer (9). Clarke et al (12) utilized WGCNA to identify 11 coregulated gene clusters across 2,342 breast cancer samples in 2013. In addition, the study found several upregulated genes; for example, the potassium channel subfamily K member 5 was correlated with a poor outcome for patients with breast cancer. In the same study, an online database was developed to allow users to retrieve co-expression patterns and the survival analysis (12). Hua et al (13) used WGCNA to identify specialized microRNA-microRNA networks for two breast cancer subtypes in 2013. However, to the best of our knowledge, no study has previous compared the co-expression network of aggressive breast cancer cells with those of nonaggressive breast cancer cells. In the present study, a WGCNA was used to reveal shared and unique properties of aggressive and non-aggressive breast cancer groups by comparing the co-expression networks of these two groups. Modules within the gene expression data of aggressive and non-aggressive breast cancer were identified. The aggressive group had six modules and the non-aggressive group had three modules. Gene Ontology (GO) enrichment demonstrated that blue and red modules in the metastatic breast cancer group were closely associated with tumor aggressive. To analyze the signature co-expression network in aggressive group, the genes of blue and red modules in aggressive group were selected to identify the corresponding genes in co-expression network in the non-aggressive group. Additionally, the hub genes (the nodes that had five strongest connections with other nodes) were filtered to analyze the difference between the aggressive and non-aggressive cell lines. It was aimed to identify the most significantly different networks between two groups. The results demonstrated that certain genes in the blue module were associated with metastasis, including gap junction γ-1 protein (GJC1), Annexin A3 (ANXA3) and Twist-related protein 1 (TWIST1), which were present in the aggressive group and absent in the non-aggressive group. In the red module, the aggressive suppressor gene, Dickkopf-related protein 3 (DKK3), had a weak connection in the aggressive group and a strong connection in the non-aggressive group. Therefore, this study provides a new insight into understanding the differences in the co-expression networks between aggressive and non-aggressive breast cancer. Furthermore, the genes obtained from WGCNA are validated by data from breast cancer patients in The Cancer Genomic Atlas (TCGA) and Gene Expression Omnibus (GEO) databases.

Materials and methods

Sample collection

Generally, lymph-node metastasis and distant metastasis is considered as marker for aggressive and non-aggressive. Other studies considered the relapse of tumor as a marker of metastasis and non-metastasis (5). In fact, patient tissues are so complex that it is difficult to distinguish metastatic and non-metastatic cancer. Thus, breast cancer cell lines that are easily separated into aggressive and non-aggressive groups were used in the current study. We divided the breast cancer cell lines into an aggressive group (HCC202, Hs578T, MDA-MB-453, BT549 and MDA-MB-231) and non-aggressive group (BT474, MCF7, MDA-MB-435, SUM225 and SKBR3) by SATB1 expression (14). The raw expression data of breast cancer cell lines were obtained from the GEO database (www.ncbi.nlm.nih.gov/geo) under the Affymetrix Human Genome U133 Plus 2.0 Array (HG-U133_Plus_2) platform (15). In summary, we found 27 aggressive breast cancer cell line samples and 38 non-aggressive breast cancer cell line samples. The list of all samples is presented in Table I.
Table I.

All samples of aggressive and non-aggressive breast cancer cell lines.

A, Non-aggressive

GEO no.Cell line name
GSM1067677MCF7
GSM1230317MCF7
GSM1230347BT474
GSM1273928MCF7
GSM1273929MCF7
GSM1298685MCF7
GSM1298686MCF7
GSM1298687MCF7
GSM1374661MDA-MB-453
GSM156771MCF10A
GSM212661MCF7
GSM286756MCF7
GSM286757MCF7
GSM286758MCF7
GSM286762MCF7
GSM286763MCF7
GSM286764MCF7
GSM286768MCF7
GSM286769MCF7
GSM286770MCF7
GSM297803MCF7
GSM436499MCF7
GSM436500MCF7
GSM436501MCF7
GSM678802MCF7
GSM678803MCF7
GSM678804MCF7
GSM699776MCF7
GSM699777MCF7
GSM803623MCF7
GSM803682MCF7
GSM803741MCF7
GSM820808HMEC
GSM820809HMEC
GSM820810HMEC
GSM984494BT474
GSM984498MCF7
GSM984499SKBR3

B, Aggressive

GEO no.Cell line name

GSM1374510HCC202
GSM1374550Hs578T
GSM573291MDA-MB-231
GSM573292MDA-MB-231
GSM573293MDA-MB-231
GSM596523MDA-MB-231
GSM596524MDA-MB-231
GSM596525MDA-MB-231
GSM803625MDA-MB-231
GSM803626MDA-MB-435
GSM803684MDA-MB-231
GSM803685MDA-MB-435
GSM803744MDA-MB-435
GSM820814MDA-MB-231
GSM820815MDA-MB-231
GSM820816MDA-MB-231
GSM839353MDA-MB-231
GSM839354MDA-MB-231
GSM839355MDA-MB-231
GSM843477BT549
GSM843478BT549
GSM843479BT549
GSM870207MDA-MB-231
GSM870208MDA-MB-231
GSM870209MDA-MB-231
GSM870210MDA-MB-231
GSM984501Hs578T

GEO, gene expression omnibus.

Data pre-processing

The software Affymetrix Expression Console was applied to normalize the raw data with the approach of Robust Multi-array Average algorithm. For computational reasons, network analysis was limited to the most varying 4,000 gene sets. Although some genes are represented in multiple gene sets and other gene sets are not fully annotated, for consistency, gene sets as are referred to as ‘genes’ throughout the study, unless otherwise noted. Although the validation data was performed on Affymetrix Human Genome U133 Plus 2.0 Array (HG-U133_Plus_2), the pre-processing method was the same as the cell line samples.

Construction of WGCNA

The WGCNA implemented in the R software package (http://www.r-project.org/) is employed to construct the gene co-expression network and identify the co-expression modules (6,16,17). Highly connective module genes are represented and summarized by their first principal component, and it has been called the module eigengene (7). The data sets used for gene co-expression network construction consisted of 27 aggressive and 38 non-aggressive samples, respectively. The network analysis is applied to breast cancer data set, a signed weighted network adjacency matrix id defined as: x and x represent the expression value of gene expressions that are numeric vector whose entries report the β values across the individuals. To construct sample networks, a measure of connection strength, or adjacency, is defined for each pair of genes i and j and denoted by aij. A mathematical constraint on aij is that its values must be between 0 and 1. The power βT is a soft-thresholding parameter that can be used to emphasize high positive correlations at the expense of low correlations. The β is a parameter of adjacency function. The function of β is to construct a weighted network. In fact, β is a threshold parameter that needs to be determined. In WGCNA theory (only consider the parameter values that lead to a network satisfying scale-free topology at least approximately), the scale free topology fitting index (R2) depends on thresholds (β). A major advantage of weighted correlation networks is that they are highly robust with regard to the choice of β (16). Generally, the topology of the weighted gene co-expressing network is constructed based on the hypothesis of scale-free network. In the present study, when the thresholds of power in gene expression of aggressive and non-aggressive breast cancer lines were 12 and 6, the topology of the two weighted gene co-expression networks were consistent with the topological structure of scale-free networks. Thus, power=12 and power=6 were selected as the final parameter for two groups of breast cancer lines. In the co-expression network, the genes represent the nodes and the aij represent the edges. The value of aij represents the strength connectivity of the edges. The overall connectivity for each gene (k) is the sum of the connection strengths (|correlation|β) between that gene and all other 1,810 genes in the network, scaled between 0 and 1. The intramodular connectivity for each gene (k) is the sum of the connection strengths between that gene and all genes in its module, scale to between 0 and 1.

Gene Ontology (GO) enrichment

The annotations and functions of proteins were obtained from the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources 6.7 (http://david.abcc.ncifcrf.gov/home.jsp) (18,19). GO terms assigned a Benjamini-Hochberg adjusted P<0.05 by DAVID were deemed to be enriched over the background gene set. In this study, each module of the aggressive group was submitted into DAVID for GO enrichment.

Specific network analysis and visualization

To identify pairs of genes with high ‘topological overlap’ (TO) in aggressive breast cancer (agg) and low TO in non-aggressive breast cancer (nonagg) in given modules, for each pair of genes i and j we defined the aggressive group specificity measure (ASij) as follows: where mean (TO) represents the mean pairwise TO value in a given module for aggressive breast cancer or non-aggressive breast cancer. Connections for which the value of this ratio exceeded 0.8 were deemed present in aggressive group and absent in non-aggressive group.

Filter and restrict co-expression network

For further improving the identification of strength connection in given modules, the analysis was restricted by retaining only those genes for which k was >0.5. Furthermore, for the network in given modules the top 20% weight of pairs of genes were selected.

Hub genes validation in clinical data

Breast cancer gene expression and clinical data were downloaded from The Cancer Genome Atlas (TCGA; https://cancergenome.nih.gov/) on April 2, 2016. Each sample represents a case in the TCGA data set. The three criteria used to select desired samples were as follows: i) Patients both with clinical data and gene expression were selected; ii) survival time of patients was more than 30 days; iii) all gene expressions were assayed by next-generation sequencing technologies. The three criteria resulted in 1,132 samples. The validation data set was obtained from GEO (GSE3494) that contains 262 tissue samples of patients with breast cancer. The validation data set was divided into metastatic and non-metastatic groups by the clinical traits of positive and negative lymph node metastasis. The groups contained 84 metastatic samples and 178 non-metastatic samples.

Survival analysis of hub genes

The univariate Cox proportional hazard regression as used to compute the hazard ratio (HR) and P-value for each hub gene obtained from co-expression network analysis. P≤0.05 was considered to indicate significant association with survival. Genes that had a HR>1 were considered to be high-risk genes, while a HR<1 were defined as risk-reducing genes. The Wald test was employed to assess the difference between two groups associated with time to an event endpoint (20). Prognosis index (PI) is an integrated indicator of hub genes for each breast cancer patient in the TCGA or GEO database. The value of PI is a linear combination of coefficient and gene expression. The PI was calculated from linear combination of the expression value of the gene expressions multiply by univariate Cox regression coefficients. For integrating indicators of genes for each patient, a weighted prognostic index (WPI) was defined as follows (21): Where Coef represents the Cox regression coefficient of the ith gene and X represents the value of the ith gene expression. Mean (PI) and standard deviation (PI) represent the mean value and standard deviation of the PI, respectively. Where X is the log2-transformed expression value of each gene and is Coef the univariate Cox proportional hazards regression coefficient of the ith gene.

Results

Co-expression network of aggressive group and non-aggressive group

The gene co-expression networks are constructed from microarray data consisting of 27 aggressive cell lines and 38 non-aggressive cell lines (Table I). For examining the difference of the two groups of breast cancer, the overlap between two groups was determined. A total of 1,811 genes were derived from the 4,000 genes with the most variance. All possible pairwise correlations were calculated for 1,811 genes in aggressive and non-aggressive cell line in parallel and converted into measures of connection strength by taking their absolute values and raising them to a power, β (16). Summing the connection strengths for each gene with all other genes resulted in a number that termed network connectivity (k). The connectivity represents how strongly that gene is connected to all other genes in the network. For identifying the modules of co-expression genes, the genes with similar patterns of connection strengths to other genes or high TO was calculated (22). WGCNA is employed to calculate TO and clustered genes on the basis for aggressive and non-aggressive groups, identifying six distinct gene co-expression modules in aggressive samples and three co-expression modules in non-aggressive samples (Fig. 1).
Figure 1.

Network analysis of gene expression in aggressive and non-aggressive. Bistinct modules of co-expressed genes in (A) aggressive and (B) non-aggressive breast cancer cell lines. Dendrograms produced by average linkage hierarchical clustering of 1,811 genes bases on topological overlap. Modules were assigned colors as an indicator in the horizontal bar beneath the aggressive dendrogram. The bottom color bar represents the module after merging modules. Classical multidimensional scaling plots in three dimensions depict the relative size and cohesion of modules in aggressive and non-aggressive group.

As presented in Fig. 1, there were 1,811 overlapping genes in the different clusters in aggressive and non-aggressive groups. In the present study, the size was restricted to a minimum of 30 genes in one module. The aggressive group contained six modules (excluding the grey color module) and non-aggressive group contained three modules (excluding grey color module). For investigation of the topology of the co-expression network difference between aggressive and non-aggressive cell lines, the connectivity of both groups was calculated using the R and WGCNA package (Fig. 2).
Figure 2.

Connectivity of aggressive and non-aggressive breast cancer comparison. (A) The slope of the curve changes greatly at 12 in aggressive breast cancer cells. Power=12 led to the aggressive network satisfying scale-free topology. (B) The slope of the curve changes greatly at 6 in non-aggressive breast cancer cells. Power=6 led to non-aggressive network satisfying scale-free topology. (C) Power=12 was used for the aggressive group and depicted the scale-free topology and (D) power=6 was used for the non-aggressive group and depicted the scale-free topology. The black curve corresponds to scale-free topology and the red curve corresponds to truncated scale-free topology. (E) Spearman's rank correlation was used for comparing network connectivity between aggressive and non-aggressive. The value of Spearman's rank correlation (rho) is 0.16 and P=5.133×10−12.

As shown in Fig. 2, rho=0.16 and P=5.133×10−12, which represent a significant linear correlation between the two types of cell lines. This association was examined further using Pearson correlation. The analysis produced a correlation coefficient of 0.060 and P=0.010. Although, P<0.05, the correlation coefficient demonstrated that they have a weak positive correlation. The results indicated that two types of cell lines have specific co-expression networks.

GO enrichment for both groups

For investigating the biological process of each module in aggressive and non-aggressive cell lines, DAVID was used for analysis. Table II presents the top five GO terms in each module. The six modules were distributed in different biological processes.
Table II.

List of the top GO terms in the most significant the Database for Annotation, Visualization and Integrated Discovery functional clusters for each network module.

A, Aggressive breast cancer cell lines

Top five termsNo. of genes in MEP-valueFDR
Blue module363
  GO:0001501:skeletal system development1.05×10−81.83×10−5
  GO:0007155:cell adhesion7.15×10−81.25×10−4
  GO:0022610:biological adhesion7.39×10−71.29×10−3
  GO:0001568:blood vessel development3.27×10−65.70×10−3
  GO:0001944:vasculature development3.73×10−66.50×10−3
Brown module359
  GO:0048545:response to steroid hormone stimulus5.91×10−99.53×10−6
  GO:0008285:negative regulation of cell proliferation6.08×10−99.82×10−6
  GO:0009725:response to hormone stimulus1.27×10−62.04×10−3
  GO:0042127:regulation of cell proliferation6.36×10−61.02×10−2
  GO:0009719:response to endogenous stimulus8.54×10−51.39×10−1
Green module183
  GO:0007167:enzyme linked receptor protein signaling pathway4.76×10−47.57×10−1
  GO:0001525:angiogenesis6.42×10−41.02
  GO:0009611:response to wounding7.10×10−41.13
  GO:0048514:blood vessel morphogenesis1.04×10−31.64
  GO:0001568:blood vessel development1.8×10−32.85
Red module  74
  GO:0030030:cell projection organization4.04×10−56.34×10−2
  GO:0034329:cell junction assembly1.37×10−42.14×10−1
  GO:0006928:cell motion2.24×10−43.50×10−1
  GO:0034330:cell junction organization2.30×10−33.54
  GO:0000904:cell morphogenesis involved in differentiation2.49×10−33.83
Turquoise module196
  GO:0046907:intracellular transport2.89×10−55.04×10−2
  GO:0016192:vesicle-mediated transport1.72×10−42.99×10−1
  GO:0051270:regulation of cell motion3.15×10−45.46×10−1
  GO:0001701:in utero embryonic development4.69×10−48.14×10−1
  GO:0010033:response to organic substance4.69×10−48.14×10−1
Yellow module191
  GO:0048732:gland development1.61×10−52.61×10−2
  GO:0042981:regulation of apoptosis1.83×10−52.97×10−2
  GO:0043067:regulation of programmed cell death1.92×10−53.11×10−2
  GO:0010941:regulation of cell death2.61×10−44.21×10−1
  GO:0009611:response to wounding5.23×10−48.43×10−1

B, Non-aggressive breast cancer cell lines

Top five termsNo. of genes in MEP-valueFDR

Blue module374
  GO:0006796:phosphate metabolic process2.79×10−44.80×10−1
  GO:0006793:phosphorus metabolic process2.79×10−44.80×10−1
  GO:0000075:cell cycle checkpoint2.91×10−44.99×10−1
  GO:0010033:response to organic substance4.34×10−47.43×10−1
  GO:0046907:intracellular transport5.63×10−49.65×10−1
Brown module  90
  GO:0007178:transmembrane receptor protein serine/threonine kinase signaling pathway6.48×10−39.26
  GO:0051789:response to protein stimulus7.02×10−310.24
  GO:0009615:response to virus7.58×10−310.75
  GO:0030509:BMP signaling pathway1.11×10−215.84
  GO:0006955:immune response1.29×10−217.61
Turquoise module1,345
  GO:0007155:cell adhesion8.22×10−121.49×10−8
  GO:0022610:biological adhesion8.85×10−121.61×10−8
  GO:0009611:response to wounding1.83×10−113.32×10−8
  GO:0048732:gland development3.85×10−116.98×10−8
  GO:0001568:blood vessel development4.20×10−117.63×10−8

GO, gene ontology; FDR, false discovery rate; ME, module eigengene.

From Table II, the GO enrichment demonstrated that biological process of distribution of modules in the aggressive and non-aggressive group. The results demonstrated the difference in biological processes in both groups. Previous publications have reported that tumor metastasis is closely associated with cell adhesions (23,24), cytoskeletal development (25), cell growth (26) and the glycolysis pathway (5). Therefore, the modules of blue and red in the aggressive group were considered to be associated with metastasis.

Visualization of intramodular network construction for identification of hub genes and specific network connections of breast cancer metastasis

To identify the metastasis specific network, the greatest TO in metastatic breast cancer was depicted in the blue and red modules by using Cytoscape 3.01 (27). The specific network of metastatic breast cancer (ASij>0.8) was obtained using the previously described equation (2). Subsequently, the hub genes (strongest connections with other genes) generally represent the important function in biological networks (28,29). Fig. 3 presents the specific co-expression network in the blue and red modules.
Figure 3.

Visualization of specific network of blue and red modules in metastasis and non-metastasis breast cancer. (A) The light blue nodes represent specific nodes and linkage in blue module network of aggressive breast cancer. The dark blue nodes represent the overlap between metastasis and non-aggressive breast cancer. (B) The light blue nodes represent specific nodes in non-aggressive breast cancer. The dark blue nodes represent the overlap between aggressive and non-aggressive breast cancer. (C) The light red nodes represent the specific nodes and linkage in red module network of aggressive breast cancer. The dark red nodes represent the overlap between aggressive and non-aggressive breast cancer. (D) The light red nodes represent the specific nodes in non-metastasis breast cancer. And the dark red nodes represent the overlap between aggressive and non-aggressive breast cancer.

Fig. 3A and B presents the comparison of the specific co-expression network of the blue module in aggressive breast cancer and non-aggressive breast cancer. These were filtered to obtain the top 20% greatest TO of aggressive breast cancer and non-aggressive cancer. The overlapping nodes (dark blue nodes) were arranged into similar locations in the network and the nodes demonstrated the difference in connectivity between the aggressive group and non-aggressive group in the blue module. The aggressive group had the sparse connectivity and the non-aggressive group had the dense connectivity. In Fig. 3C and D, the red module network also demonstrated the difference in network topology between aggressive group and non-aggressive group. For further investigation of the difference of the modules networks, hub genes were selected for analysis. Table III presents the top five genes with high intramodule connectivity (k) as hub genes in the aggressive group.
Table III.

List of top five genes with high k as hub genes in blue and red modules.

A, Blue module of aggressive group

Gene symbolAccession of uniprotGene namekin (normalized)
TMEM47Q9BQJ4Transmembrane protein 471.000
GJC1P36383Gap junction γ-1 protein0.929
ANXA3P12429Annexin A30.925
TWIST1Q15672Twist-related protein 10.917
C19orf33Q9GZP8Immortalization upregulated protein0.905

B, Red module of non-aggressive group

Gene symbolAccession of uniprotGene namekin (normalized)

LOC100653217Neurotrimin-like1.000
CXCL12P48061Stromal cell-derived factor 10.958
SULF1Q8IWU6Sulfatase 10.936
DOK5Q9P104Docking protein 50.819
DKK3Q9UBP4Dickkopf-related protein 30.782
The greatest k values in the aggressive group are presented in Table III. The hub genes in the blue module of aggressive group were all absent in the non-aggressive group. The hub genes included stromal cell-derived factor 1 (CXCL12) and docking protein 5 (DOK5) in the aggressive group red module were present in the non-aggressive group. The genes GJC1, ANXA3 and TWIST1 have been previously reported to be associated with metastatic tumor (30–32). GJC1 is associated with breast cancer, which was associated with amplification of ERBB2 receptor tyrosine kinase 2 (ERBB2) that is an important breast cancer marker (33). ANXA3 was previously reported as a novel biomarker for lymph node metastasis and prognosis in lung cancer (31). TWIST1 is an extensively studied regulator associated with breast cancer metastasis. TWIST1 is considered to be a master regulator of embryonic morphogenesis and has an essential role in metastasis (32,34,35). Transmembrane protein 47 (TMEM47) and immortalization upregulated protein (C19orf33) are not reported to be involved in breast cancer metastasis, to the best of our knowledge. In the red module, CXCL12, sulfatase 1 (SULF1), DOK5 and DKK3 are all reported to be closely associated with breast cancer metastasis. CXCL12 possesses angiogenic properties and is involved in the outgrowth and metastasis of C-X-C motif chemokine receptor 4-expressing tumors and certain inflammatory autoimmune disorders (36). SULF1 overexpression is considered as a prognostic and metastasis predictive marker in human gastric cancer (37). DOK5 expression is indicated to cause a significant enhancement in the metastatic potential of the B16F10 cell line (38). DKK3 expression increased cell-cell adhesion and decreased cell migration (39). The function of neurotrimin-like (LOC100653217) is currently unclear. For validation of the hub genes using clinical data, invasive breast carcinoma data was retrieved from the TCGA and GEO databases. HR and P-value from Cox regression analysis were calculated and presented in Table IV.
Table IV.

Nine hub genes predictive of survival in patients with breast in the Cancer Genome Atlas database.

Gene symbolGene nameHazard ratioCox P-valueConfidence interval (95%)
TMEM47Transmembrane protein 471.1610.0041.049–1.286
GJC1Gap junction γ-1 protein1.1920.0251.022–1.390
ANXA3Annexin A31.1140.0161.021–1.214
TWIST1Twist-related protein 11.1450.0191.022–1.283
C19orf33Immortalization upregulated protein0.9560.1180.903–1.012
CXCL12Stromal cell-derived factor 11.2030.0011.076–1.345
SULF1Sulfatase 10.9500.3750.848–1.064
DOK5Docking protein 51.0450.3950.944–1.158
DKK3Dickkopf-related protein 31.1940.0041.059–1.344
Table IV demonstrates that C19orf33, SULF1 and DOK5 had P>0.05. Other genes were significantly associated with the survival time of patients with breast cancer and they are high-risk genes (HR>1). Of these genes, TMEM47, CXCL12 and TWIST1 have been demonstrated to be closely associated with breast cancer aggression in previous studies (40–42). Other genes with P<0.05 may also be promising biomarkers for the prediction of survival in patients with breast cancer, in which further study is required. Generally, breast cancer aggressiveness is closely associated with overall survival or disease relapse (43). Thus, the highest k hubs in two modules were tested by survival analysis according to their expression. LOC100653217 was not found in the TCGA database. Therefore, Cox regression and survival analysis was used to determine the prognostic index of nine genes. The WPI obtained from nine genes and 1,132 samples from the TCGA as applied to classify low-risk and high-risk groups (Fig. 4A). Log-rank test (Fig. 4B) demonstrated that the two groups classified by hub genes have significantly significant difference (log-rank test, P<0.05, hazard ratio=1.231, 95% confidence interval=1.058–1.433; Wald test, P=0.0071). Additionally, the recurrence of cancer is another important indicator for estimating the aggressiveness. Thus, the GSE3494 dataset that includes cancer relapse data of patients with breast cancer was used to validate the hub genes. The results of log-rank testing demonstrated that high-risk group patients had a significantly shorter relapse time compared with patients in the low-risk group (log-rank test, P<0.05). The area under the curve of the receiver operating characteristic was 0.697, which suggests that the integrative hub genes are good predictors of breast cancer relapse (Fig. 5).
Figure 4.

Kaplan-Meier survival curves for testing hub genes in blue and red module. (A) The classification of low-risk and high-risk by WPI of hub genes in overall survival (days). (B) Kaplan-Meier curve obtained from WPI classification by hub genes expression in breast cancer patients (P=0.0248). WPI, weighted prognostic index; TCGA, The Cancer Genome Atlas.

Figure 5.

Kaplan-Meier survival curves and ROC curves for testing hub genes in blue and red module in GSE3439 dataset. (A) Kaplan-Meier curve obtained from the weighted prognostic index classification by hub gene expression in breast cancer patients (P=0.0241). (B) ROC curve had an area under the curve of 0.697 in validation data set. GEO, Gene Expression Omnibus; ROC, receiver operating characteristic.

Discussion

The current study used WGCNA to explore gene co-expression between aggressive breast cancer and non-aggressive breast cancer cell lines. Network depictions can provide immediate functional insights by revealing associations between genes and biological processes. Comparative network analysis can also prioritize genes for further investigation on the basis of different connectivity, with previous studies supporting that gene connectivity is a measure of functional relevance (44,45). The current study is based on previous reports of classification in aggressive and non-aggressive breast cell lines. However, whether the MDA-MB-435 cell line is a breast cancer cell line or a melanoma cell line has raised some controversy (46–48). Rae et al (46) and Capes-Davis et al (47) reported that the cell line was a melanoma cell line, due to karyotype and gene expression pattern similarity to melanoma cells. Whereas, Chambers (48) considered both the cell lines to be of breast cancer origin. According to Han et al (14), the MDA-MB-435 cell line indeed represents a poorly differentiated, aggressive breast tumor line indicated by overexpression of the SATB homeobox 1 (STAB1) gene. The present study focused on the co-expression network of aggressive and non-aggressive breast cancer cells. Therefore, the MDA-MB-435 cell line was included as an aggressive breast cancer cell line. Breast cancer is the most common malignant disease and the various types have been extensively investigated. Co-expression network analysis as a powerful tool is also applied to study breast cancer. In previous studies, WGCNA was used to analyze the association between gene expression in breast cancer and the clinical traits in patients (9). In this study, the WGCNA was applied to construct a co-expression network between aggressive and non-aggressive breast cancer lines. The blue module and red module were closely associated with an aggressive phonotype according to previous publications. According to the current literature regarding metastatic breast cancer, the biological mechanisms of aggressiveness are associated with cell adhesions (23,24), cytoskeletal development (25), cell growth (26) and the glycolysis pathway (5). The results of the current study demonstrated that the blue module and red module were closely associated with above biological process, excluding glycolysis. Following filtering of data, the hub genes in the blue and red modules were identified. From the finding of previous studies, many of the hub genes have been previously demonstrated to be associated with metastasis. However, the association of these genes, and difference of these genes co-expression between aggressive and non-aggressive breast cancer are unclear. In the red module network, genes such as DKK3, glycosyltransferase 8 domain containing 2, fibronectin 1, cadherin 13 and LOC100653217 were all present in the aggressive group and non-aggressive group; however, these genes had different connections in each group. For example, DKK3 as a hub gene is present in the aggressive group and non-aggressive group, but had different connectivity in the two groups. The connectivity of DKK3 in the non-aggressive group as stronger than in the aggressive group. According to previous publications, DKK3 expression can inhibit tumor metastasis (39,49). Although the P-value from Cox regression of DKK3 was <0.05, the stronger connection of DKK3 in non-aggressive cell lines and weaker connection of DKK3 in aggressive cell lines indicated that this gene may be a potential biomarker for breast cancer aggressiveness. In the blue module network, the top five hub genes were all absent in the non-aggressive group. The overlapping genes in both groups also had a difference in connection. The non-aggressive group had more dense connection than the aggressive group. The result indicated that the most of the top five hub genes were associated with tumor metastasis. Although the function of certain genes in tumor metastasis was unclear, the high connectivity and HR may indicate that they have important roles in metastasis. Previous studies have identified various markers for breast cancer metastasis and prognosis. For example, SATB1 is considered to be an important gene for breast cancer metastasis and prognosis (14). ERBB2, plasminogen activator urokinase and plasminogen activator inhibitor 1 are also important markers in breast cancer prognosis (1). Other research identified the p53, Na-K ATPase-β1 and transforming growth factor-β receptor 2 are associated with survival (9). Although the individual gene function can reflect some issue of metastasis, the metastasis and cancer is a multi-step cascade (50). The gene expressions analysis may provide more accurate information and underlying mechanisms. In the current study, the different connections may provide more information than individual gene expression differences. Different connection can reflect the difference cellular mechanisms between aggressive and non-aggressive breast cancer. The data analysis may provide a potential candidate biomarker for metastasis. Finally, PI was used to integrate these hub genes, which were then investigated in clinical data obtained from TCGA and GEO. The results demonstrate that the PI of hub genes can significantly predict clinical outcome. In further study, other potential genes are expected to be validated. The results may provide new insight into understanding the potential mechanism of aggressiveness of breast cancer.
  47 in total

1.  Hierarchical organization of modularity in metabolic networks.

Authors:  E Ravasz; A L Somera; D A Mongru; Z N Oltvai; A L Barabási
Journal:  Science       Date:  2002-08-30       Impact factor: 47.728

2.  An integrative genomics approach to infer causal associations between gene expression and disease.

Authors:  Eric E Schadt; John Lamb; Xia Yang; Jun Zhu; Steve Edwards; Debraj Guhathakurta; Solveig K Sieberts; Stephanie Monks; Marc Reitman; Chunsheng Zhang; Pek Yee Lum; Amy Leonardson; Rolf Thieringer; Joseph M Metzger; Liming Yang; John Castle; Haoyuan Zhu; Shera F Kash; Thomas A Drake; Alan Sachs; Aldons J Lusis
Journal:  Nat Genet       Date:  2005-06-19       Impact factor: 38.330

3.  TWIST1 expression in breast cancer cells facilitates bone metastasis formation.

Authors:  Martine Croset; Delphine Goehrig; Agnieszka Frackowiak; Edith Bonnelye; Stéphane Ansieau; Alain Puisieux; Philippe Clézardin
Journal:  J Bone Miner Res       Date:  2014-08       Impact factor: 6.741

4.  Expression of Dickkopf genes is strongly reduced in malignant melanoma.

Authors:  S Kuphal; S Lodermeyer; F Bataille; M Schuierer; B H Hoang; A K Bosserhoff
Journal:  Oncogene       Date:  2006-03-27       Impact factor: 9.867

5.  MDA-MB-435 cells are derived from M14 melanoma cells--a loss for breast cancer, but a boon for melanoma research.

Authors:  James M Rae; Chad J Creighton; Jeanne M Meck; Bassem R Haddad; Michael D Johnson
Journal:  Breast Cancer Res Treat       Date:  2006-09-27       Impact factor: 4.872

Review 6.  Multi-step cascade of tumor cell metastasis.

Authors:  M L Stracke; L A Liotta
Journal:  In Vivo       Date:  1992 Jul-Aug       Impact factor: 2.155

Review 7.  Movers and shakers: cell cytoskeleton in cancer metastasis.

Authors:  C M Fife; J A McCarroll; M Kavallaris
Journal:  Br J Pharmacol       Date:  2014-07-02       Impact factor: 8.739

8.  Discovery of significant pathways in breast cancer metastasis via module extraction and comparison.

Authors:  Xiaochen Wang; Huajie Qian; Shuqin Zhang
Journal:  IET Syst Biol       Date:  2014-04       Impact factor: 1.615

9.  Eigengene networks for studying the relationships between co-expression modules.

Authors:  Peter Langfelder; Steve Horvath
Journal:  BMC Syst Biol       Date:  2007-11-21

10.  An integrated mRNA and microRNA expression signature for glioblastoma multiforme prognosis.

Authors:  Jie Xiong; Zhitong Bing; Yanlin Su; Defeng Deng; Xiaoning Peng
Journal:  PLoS One       Date:  2014-05-28       Impact factor: 3.240

View more
  7 in total

1.  Downregulation of Immortalization-Upregulated Protein Suppresses the Progression of Breast Cancer Cell Lines by Regulating Epithelial-Mesenchymal Transition.

Authors:  Jialiang Wen; Lizhi Lin; Bangyi Lin; Erjie Xia; Jinmiao Qu; Ouchen Wang
Journal:  Cancer Manag Res       Date:  2020-09-18       Impact factor: 3.989

2.  Survival-related risk score of lung adenocarcinoma identified by weight gene co-expression network analysis.

Authors:  He Wang; Di Lu; Xiguang Liu; Jianjun Jiang; Siyang Feng; Xiaoying Dong; Xiaoshun Shi; Hua Wu; Gang Xiong; Haofei Wang; Kaican Cai
Journal:  Oncol Lett       Date:  2019-09-04       Impact factor: 2.967

3.  Overexpression of TMEM47 Induces Tamoxifen Resistance in Human Breast Cancer Cells.

Authors:  Xin Men; Mengyang Su; Jun Ma; Yueyang Mou; Penggao Dai; Chao Chen; Xi An Cheng
Journal:  Technol Cancer Res Treat       Date:  2021 Jan-Dec

4.  DOK5 as a Prognostic Biomarker of Gastric Cancer Immunoinvasion: A Bioinformatics Analysis.

Authors:  Fengyong Luo; Zhihuai Wang; Shuai Chen; Zhenbo Luo; Gaochao Wang; Haojun Yang; Liming Tang
Journal:  Biomed Res Int       Date:  2022-01-05       Impact factor: 3.411

5.  A Novel IGLC2 Gene Linked With Prognosis of Triple-Negative Breast Cancer.

Authors:  Yu-Tien Chang; Wen-Chiuan Tsai; Wei-Zhi Lin; Chia-Chao Wu; Jyh-Cherng Yu; Vincent S Tseng; Guo-Shiou Liao; Je-Ming Hu; Huan-Ming Hsu; Yu-Jia Chang; Meng-Chiung Lin; Chi-Ming Chu; Chien-Yi Yang
Journal:  Front Oncol       Date:  2022-01-27       Impact factor: 6.244

6.  Identification of a Recurrence Signature and Validation of Cell Infiltration Level of Thyroid Cancer Microenvironment.

Authors:  Liang Zhang; Ying Wang; Xiaobo Li; Yang Wang; Kaile Wu; Jing Wu; Yehai Liu
Journal:  Front Endocrinol (Lausanne)       Date:  2020-07-23       Impact factor: 5.555

7.  Immortalization up-regulated protein promotes tumorigenesis and inhibits apoptosis of papillary thyroid cancer.

Authors:  Lizhi Lin; Jialiang Wen; Bangyi Lin; Adheesh Bhandari; Danni Zheng; Lingguo Kong; Yinghao Wang; Ouchen Wang; Yizuo Chen
Journal:  J Cell Mol Med       Date:  2020-10-23       Impact factor: 5.295

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.