Literature DB >> 31758680

Identification of Hub Genes Using Co-Expression Network Analysis in Breast Cancer as a Tool to Predict Different Stages.

Yun Fu1, Qu-Zhi Zhou2, Xiao-Lei Zhang3, Zhen-Zhen Wang4, Peng Wang1.   

Abstract

BACKGROUND Breast cancer has a high mortality rate and is the most common cancer of women worldwide. Our gene co-expression network analysis identified the genes closely related to the pathological stage of breast cancer. MATERIAL AND METHODS We performed weighted gene co-expression network analysis (WGCNA) from the Gene Expression Omnibus (GEO) database, and performed pathway enrichment analysis on genes from significant modules. RESULTS A non-metastatic sample (374) of breast cancer from GSE102484 was used to construct the gene co-expression network. All 49 hub genes have been shown to be upregulated, and 19 of the 49 hub genes are significantly upregulated in breast cancer tissue. The roles of the genes CASC5, CKAP2L, FAM83D, KIF18B, KIF23, SKA1, GINS1, CDCA5, and MCM6 in breast cancer are unclear, so in order to better reveal the staging of breast cancer markers, it is necessary to study those hub genes. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes indicated that 49 hub genes were enriched to sister chromatid cohesion, spindle midzone, microtubule motor activity, cell cycle, and something else. Additionally, there is an independent data set - GSE20685 - for module preservation analysis, survival analysis, and gene validation. CONCLUSIONS This study identified 49 hub genes that were associated with pathologic stage of breast cancer, 19 of which were significantly upregulated in breast cancer. Risk stratification, therapeutic decision making, and prognosis predication might be improved by our study results. This study provides new insights into biomarkers of breast cancer, which might influence the future direction of breast cancer research.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31758680      PMCID: PMC6886326          DOI: 10.12659/MSM.919046

Source DB:  PubMed          Journal:  Med Sci Monit        ISSN: 1234-1010


Background

Breast cancer has a high mortality rate and is the most common cancer of women worldwide. A study published in 2013 used statistical methods to estimate that the number of new cancer cases in Europe in 2012 would be about 3.45 million, to which the largest contributor would be breast cancer. With the third-highest mortality [1]. The incidence of breast cancer in postmenopausal women is higher [2], and breast cancer is considered one of the leading causes of death in postmenopausal women, accounting for 23% of all cancer deaths [3]. In recent years, the morbidity rate of women under 50 years old has been increasing, and due to the female infertility induced by breast cancer treatments such as chemotherapy, women are caught in a dilemma of choosing between survival and fertility [4]. The occurrence of breast cancer is related to heredity [5]; therefore, the BRCA1/2 mutation has been studied for use in risk assessment in families with a high prevalence of breast cancer [6]. Many genetic tests such as the PAM50 ROR, Breast Cancer Index, and EndoPredict have been reported to predict the development of advanced recurrences [6]. There are many treatments for breast cancer, including chemotherapy, surgery, targeted therapy, hormone replacement therapy, radiation therapy, and combination therapy [3], but the large number of treatments available is a proof that each treatment is not fully effective. The poor prognosis of breast cancer is related to drug resistance, and inappropriate pathological stage may aggravate it. It has been shown that scientific and accurate pathological staging can improve individual prognosis. Weighted gene co-expression network analysis (WGCNA) is a systematic and comprehensive biological method to explore the correlation between genes, which stands out among many biological methods. The WGCNA approach operates on 2 assumptions: that molecules with similar expression patterns may be involved in specific biological functions (co-regulation of genes), and scale-free distribution [7]. In simple terms, the gene distribution is more consistent with the scale-free network distribution by selecting soft threshold β to weight the correlation coefficient so as to maximize the use of information and avoid information loss [8]. To facilitate their use, a WGCNA R software package summarized and standardized its methods, including network construction, module detection, gene selection, topological property calculation, and visualization [9]. WGCNA builds modules by identifying potential links and correlations between high-throughput genes. Modules closely related to clinical features were used as hub modules for subsequent analysis until the discovery of hub genes tightly related to the disease. This method is not only used for the detection of specific biomarkers of normal and abnormal tissues (such as cancer screening and specific biomarkers of gene-related diseases), but also for the identification of hub genes between abnormal tissues (such as tumor staging, grading, and metastasis). Therefore, the purpose of the present study was to apply the differentially expressed genes (DEGs) co-expression network constructed by WGCNA to identify a series of hub genes related with breast cancer pathological stage. These hub genes as biomarkers may provide better diagnosis and more effective treatment for breast cancer patients, thus leading to earlier detection and better results. This study may contribute to the establishment of a complete biomarker system for the pathological staging of breast cancer.

Material and Methods

Data collection and Preprocessing

The breast cancer gene expression profile of dataset GSE102484 [10], downloaded from the Gene Expression Omnibus (GEO) database (), was raw and was based on the GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array platform. In the 683 samples of its data set, we selected 374 samples of non-metastatic breast cancer, and all of our samples were female. Then, the preprocessing of 374 original expression data includes the normalization of data using a robust multi-array averaging (RMA) [11] algorithm and the filtering of the nsFilter algorithm. There were 374 samples and 10 093 probes used for subsequent analysis. Furthermore, an independent data set, GSE20685 [12], also downloaded from GEO for module preservation analysis and validation.

Weighted gene co-expression network construction

The WGCNA package in R software was used to construct the co-expression network. First, samples with a Z.K value <−2.5 were deleted as outliers and did not participate in the later analysis. For the construction of a co-expression network, we converted the correlation matrix with the removed outlier samples into the adjacency matrix based on value, and the specific calculation was aij=|cor(xi, xj)|β, and Xi and xj are the nodes i and j of the network. The β was determined by scale-free topology criterion and R2 >0.8 made the network approximately meet the scale-free network distribution generally. For details, refer to original authors Zhang and Horvath. Then, this study transformed the adjacency matrix into a topological overlap matrix (TOM) after a series of complex calculations. TOM provides a simplified diagram of the network, allowing the visualization of the network and facilitating the identification of network modules. Then, the TOM graph was analyzed by average linkage hierarchical clustering based on the phase dissimilarity (1-TOM). Finally, the dynamic shear method was used to obtain the original modules and all the unidentified genes were assigned to a module. The original modules that we obtained were screened and merged before moving on to the next analysis. According to the author’s recommendation, the number of genes in each module was 30 and above. At the same time, each module was marked with a different color for the convenience of research, and the unrecognized genes were grayed out.

Module preservation analysis

To verify the stability of the module, gene expression profiles of 327 samples from data set GSE20685 were used for module preservation analysis. We used the module preservation method in WGCNA R software to calculate the Z summary score (Z score) and medianRank [13], which assesses whether the module is preserved or not. Since Z score and medianRank have their own advantages and disadvantages, in order to treat each module equally, the analysis method with both of them is usually adopted. If the Z score is greater than 10, it is considered that the module is highly preserved, and the higher the Z score is, the higher the stability of the module is, and the more reliable the subsequent analysis will be. A medianRank of the modules close to zero indicates a high degree of module preservation.

Identifying clinically significant modules

The selection of hub modules for subsequent analysis was based on the calculation of the correlation between clinical information and gene modules and the similarity of module expression in samples, including modules eigengene (ME), gene significance (GS), and module significance (MS).

Hub genes identification, validation, and functional annotation

The hub gene is defined as the gene with the highest degree of connectivity in the hub module. Specifically, our study determined hub genes based on 2 values, and selected geneModuleMembership >0.8 and geneTraitSignificance >0.2 as the hub genes. This limitation leads to a high degree of modularity and clinical characteristics of the hub gene. Moreover, genes in hub modules were projected into a protein–protein interaction (PPI) network to further clarify the interaction and association between genes, which was one of the references for our analysis of genes and diseases and the evidence supporting the status of hub genes. The “limma” package is used to identify differentially expressed genes that are widely used in disease and gene research. We used “limma” to test our hub genes. If the P value of the gene is less than the selected significance level (0.05 or 0.01), the selection of the gene is considered statistically significant and it is considered to be validated. Verification makes our selection of hub genes more scientific and convincing. To further clarify how hub genes influence related clinical characteristics of interesting modules, we used Enrichr () database to Gene Ontology (GO) function module of the gene annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis [14].

Survival analysis

Survival analysis is used to determine the relationship between the expression profile of one or more genomes and survival time. Survival analysis of WGCNA is usually done by the non-parametric Kaplan-Meier method [15]. The patients were divided into 2 groups according to median expression value of hub genes by the Kaplan-Meier method, and the results were presented by drawing a survival curve. As a common way to compare survival curves, the log-rank test can conclude that there is no statistically significant difference between groups by analyzing the significance of differences between actual and theoretical values. To assess the relationship between hub genes and breast cancer patients, the Kaplan-Meier survival analysis and log-rank test using the “survival” package of R software were conducted.

Results

Weighted co-expression network construction

The 374 non-metastatic breast cancer samples without stage IV in the independent data set GSE102484 from GEO were filtered by RMA method normalization and nsFilter. Since there were 7 samples with Z.K value <−2.5 (GSM2738700, GSM2738857, GSM2738923, GSM2738999, GSM2739109, GSM2739182, and GSM2739239), these 7 samples were considered as outliers and were excluded from subsequent analysis (Figure 1). Therefore, the gene expression profile of 367 samples was used to construct the gene co-expression network by using the WGCNA package. First, we selected the soft threshold β=5 (scale-free R2=0.90) according to Supplementary Figure 1 as the weighting coefficient to ensure a scale-free network. Second, our study used average linkage hierarchical clustering, the TOM-based dissimilarity, dynamic tree clipping, and merging processing to identify modules, and it obtained 18 modules marked with different colors (Supplementary Figure 2).
Figure 1

Sample dendrogram and trait heatmap. Seven samples with a Z.K value <−2.5 are outliers. The color intensity was proportional to older age as well as higher stage, Tstage, and Nstage.

We used “modulePreservation” [13] to calculate the preservation statistics of 2 independent modules and to determine whether the modules were preserved according to Z summary score and medianRank. Figure 2 shows that the Z summary score of the brown, turquoise, and yellow modules were all above 50, and the highest Z summary score of the brown module indicated that this module was the best preserved. To compensate for the weakness that Z summary score is dependent on module size, we continued to analyze the relevant medianRank. Among the 3 modules of brown, turquoise, and yellow, the medianRank of the brown module was still prominent. Brown modules, whose Z summery score is more than 60 were considered to be the best preserved modules in this study.
Figure 2

The medianRank and Zsummary statistics of the module preservation based on GSE20685. The medianRank of the modules close to zero indicates the high degree of module preservation, and the Zsummary of the modules close to zero indicates the low degree of module preservation.

Identifying hub modules

The selection of hub modules was based on the correlation between modules and traits. Therefore, brown modules were selected as the hub modules in this study according to the closest correlation between modules and stages (r=0.22, P=3e-05, Figure 3). At the same time, we also found that the brown module was also the most correlated with Tstage (r=0.18, P=4e-04, Figure 3), and the correlation between Nstage and the brown module was also high (r=0.14, P=0.008, Figure 3). Furthermore, the brown module showed high genetic significance and module membership (cor=0.49, P=4.4e-67, Supplementary Figure 3), which further confirmed the status of the brown module as a hub module.
Figure 3

Heatmap of the correlation between module eigengenes and clinical traits of breast cancer. Each module is based on the pattern of their co-expression. Stage indicated bipolar disorder (breast cancer and non-breast cancer).

Hub genes identification, validation and functional annotation

There were 1093 genes in the brown module and 49 that meet our requirements (geneModuleMembership >0.8 and geneTraitSignificance >0.2), and the special data are shown in Table 1. The 2290 genes were shown to be associated with breast cancer in previous studies through Junior Doc (), and 21 of our 49 hub genes had not been reported (e.g., TPX2, MCM10, NCAPG, and KIF4A). The PPI network of brown modules showed significant connectivity, and the overall effect of the network graph was good (Supplementary Figure 4). In this network, the size and color depth of nodes are proportional to their degree of connection, which facilitated our qualitative observation. The node degree, betweenness, stress, closeness, and clustering coefficient of 49 hub genes in the brown module were quantitatively compared (Table 2). AURKA gene has the largest node and the darkest color, which was an important object for our subsequent discussion and research. Independent GSE20685 was used by us for gene validation, and the condition for our hub genes to pass the verification was P<0.05. Supplementary Figure 5 intuitively shows our verification results, in which the P values of 49 hub genes are less than 0.05, which indicates that the selection of our 49 hub genes was statistically significant, scientific, and convincing. Meanwhile, according to |log2(fold change)| ≥0.5, 19 of the 49 hub genes were significantly upregulated (Supplementary Figure 6).
Table 1

The 49 hub genes most associated with breast cancer.

Probe IDGene symbolEntrez gene IDGene module membershipGene trait significance
209642_atBUB16990.9242932260.235607323
209408_atKIF2C110040.922189340.241907024
210052_s_atTPX2229740.9144807920.223144303
224753_atCDCA51131300.912054070.259025302
220651_s_atMCM10553880.9105630710.208487817
204822_atTTK72720.9079482290.219457711
204825_atMELK98330.9035143780.231607588
218726_atHJURP553550.903282390.22698554
225687_atFAM83D816100.8998866190.203016176
218663_atNCAPG641510.8960544260.205216007
218755_atKIF20A101120.8951669410.226348808
202870_s_atCDC209910.8922407160.239137318
221520_s_atCDCA8551430.891187460.244566866
206364_atKIF1499280.8909502520.26234143
209464_atAURKB92120.8905529660.22142283
203755_atBUB1B7010.8886728890.213901874
218355_atKIF4A241370.8879459910.228378944
202954_atUBE2C110650.8878183680.226938633
203554_x_atPTTG192320.8811550190.213686403
205046_atCENPE10620.8800692190.22209321
208079_s_atAURKA67900.8791520520.218061231
203358_s_atEZH221460.8757433880.207933283
222608_s_atANLN544430.8713397010.234975449
205339_atSTIL64910.8691072610.210922715
205024_s_atRAD5158880.8675085270.213183274
228069_atMTFR21131150.8649556840.211359463
228868_x_atCDT1816200.864787640.229292335
205733_atBLM6410.8607850040.217032378
228323_atCASC5570820.8570928960.202429723
201710_atMYBL246050.856350780.236329323
206102_atGINS198370.8563115740.208385671
222077_s_atRACGAP1291270.8538419210.256705857
204709_s_atKIF2394930.8513791930.21561731
229610_atCKAP2L1504680.850110160.229596216
204033_atTRIP1393190.8495932840.202381238
218009_s_atPRC190550.8495815710.221410912
207746_atPOLQ107210.8485291810.247638427
214804_atCENPI24910.8467732380.207038989
217640_x_atSKA12201340.8377394980.203041011
219650_atERCC6L548210.828217990.202549981
222039_atKIF18B1469090.8259337960.203151907
207828_s_atCENPF10630.8187164180.230482778
228273_atPRR11557710.8153317260.215057337
204023_atRFC459840.8152337990.213293858
213008_atFANCI552150.8105801050.2165155
204817_atESPL197000.8082840830.211142978
202107_s_atMCM241710.8039260630.207204313
209680_s_atKIFC138330.8010659760.213650537
201930_atMCM641750.8008712660.214708557

Gene module membership represents the degree of linkage between hub genes and other genes, while gene trait significance represents the relationship between genes and clinical features.

Table 2

The node degree, betweenness, stress, closeness, and clustering coefficient of 49 hub genes in brown module.

GeneNode degreeBetweennessStressClosenessClustering coefficient
AURKA500.0199425155610.77306122
TPX2490.008854084640.980392160.80272109
BUB1490.008854084640.980392160.80272109
BUB1B490.008854084640.980392160.80272109
CDCA8490.008854084640.980392160.80272109
KIF2C490.018608245080.980392160.78401361
TTK490.008854084640.980392160.80272109
NCAPG490.008854084640.980392160.80272109
KIF20A480.00787174180.961538460.81471631
MELK480.00787174180.961538460.81471631
CDC20470.005997243520.943396230.83718779
AURKB470.006030833520.943396230.83718779
MCM10470.015678584080.943396230.81128585
CENPF470.006030833520.943396230.83718779
UBE2C470.007308383860.943396230.82146161
KIF4A470.005387523360.943396230.84458834
RACGAP1460.005830643260.925925930.84251208
KIF23460.004570072940.925925930.85797101
CENPE450.004202292680.909090910.86464646
PRC1440.003950042480.892857140.86892178
CDCA5440.004482172520.892857140.86680761
HJURP440.004496432620.892857140.8615222
TRIP13430.003075172040.877192980.88704319
PTTG1410.003197211760.847457630.89268293
ANLN410.002340051600.847457630.90243902
FANCI410.002401471600.847457630.90243902
CDT1400.002499341580.833333330.89871795
MCM2400.002380951520.833333330.9025641
ESPL1400.002188841420.833333330.90897436
KIF14400.002589381520.833333330.9025641
RAD51380.002182961380.806451610.90184922
MCM6360.00155343980.781250.92222222
CASC5367.88E-04580.781250.95396825
SKA1360.00155359960.781250.92380952
KIF18B360.00151695980.781250.92222222
KIFC1360.00120015800.781250.93650794
ERCC6L360.001744361100.781250.91269841
CKAP2L350.00166605940.769230770.9210084
RFC4340.00115046740.757575760.93404635
MYBL2340.00150041820.757575760.92691622
POLQ330.006704631080.746268660.89772727
EZH2338.32E-04540.746268660.94886364
CENPI324.99E-04360.735294120.96370968
FAM83D292.09E-04160.704225350.98029557
STIL292.24E-04160.704225350.98029557
BLM266.48E-04440.675675680.93230769
GINS1263.55E-04260.675675680.96
PRR11185.66E-0540.60975610.9869281
MTFR215000.588235291
To determine the mechanism of action of the 49 hub genes in the brown module, the 49 hub genes were uploaded to Enichr for GO function annotation and KEGG enrichment analysis. GO function annotation indicated that the brown module was enriched to sister chromatid cohesion, spindle midzone, and microtubule motor activity (Table 3). KEGG enrichment analysis suggested the brown module was enriched in cell cycle (Table 4). Tables 3, 4 show that the crude and adjusted P value, Z score, the combined score of pathways, and genes were included in the pathway.
Table 3

GO function annotation of 49 hub genes.

SeriesNameP valueAdjusted P valueZ scoreCombined scoreGenes
GO Cellular ComponentSpindle midzone (GO: 0051233)7.05E-229.03E-20−2.19106.46KIF14; BUB1B; CDCA8; TTK; KIF23; AURKB; AURKA; CDC20; TPX2; CENPF; RACGAP1; PRC1; KIF20A
GO Cellular ComponentMitotic spindle (GO: 0072686)4.49E-182.37E-16−2.391.68CDC20; TPX2; CENPF; RACGAP1; ESPL1; CKAP2L; PRC1; TTK; KIF23; KIF20A; AURKB; AURKA
GO Cellular ComponentMitotic spindle midzone (GO: 1990023)5.56E-182.37E-16−2.0681.88TPX2; CENPE; RACGAP1; ESPL1; CKAP2L; KIF14; BUB1B; CDCA8; KIF23; AURKB; AURKA
GO Biological ProcessSister chromatid cohesion (GO: 0007062)6.67E-181.35E-15−2.76109.02CDC20; CENPE; CENPF; ERCC6L; CENPI; CDCA5; BUB1B; CDCA8; KIF2C; BUB1; AURKB; SKA1
GO Cellular ComponentSpindle microtubule (GO: 0005876)1.69E-175.42E-16−2.3289.73KIF14; TTK; KIF23; AURKB; SKA1; AURKA; CDC20; TPX2; CENPE; CENPF; PRC1; KIF4A; KIF20A
GO Cellular ComponentSpindle (GO: 0005819)2.60E-166.67E-15−2.4588.06TTK; KIF23; AURKB; SKA1; AURKA; CDC20; TPX2; CENPE; CENPF; PRC1; KIF2C; KIF20A; MCM2
GO Cellular ComponentMitotic spindle microtubule (GO: 1990498)1.87E-153.99E-14−2.171.08TPX2; RACGAP1; ESPL1; CKAP2L; PRC1; KIF4A; KIF23; AURKB; SKA1; AURKA
GO Biological ProcessMitotic cell cycle (GO: 0000278)9.03E-159.17E-13−2.8291.31CDT1; TPX2; CENPE; CENPF; KIF18B; CDCA5; BUB1B; KIF2C; SKA1; AURKA
GO Cellular ComponentMeiotic spindle (GO: 0072687)1.20E-142.20E-13−1.9863.49CDC20; TPX2; CENPF; PRC1; TTK; KIF23; KIF20A; AURKB; AURKA
GO Biological ProcessMicrotubule-based movement (GO: 0007018)2.69E-141.82E-12−2.4676.95CENPE; KIF18B; RACGAP1; KIFC1; KIF4A; KIF14; KIF2C; KIF23; KIF20A
GO Cellular ComponentSpindle pole (GO: 0000922)6.45E-141.03E-12−2.1966.53CDC20; TPX2; CENPF; PRC1; TTK; KIF23; AUNIP; KIF20A; AURKB; AURKA
GO Cellular ComponentKinesin complex (GO: 0005871)9.68E-141.13E-12−1.7251.51CENPE; KIF18B; KIFC1; KIF4A; KIF14; KIF2C; KIF23; KIF20A
GO Cellular ComponentKinesin I complex (GO: 0016938)9.68E-141.13E-12−1.6850.33CENPE; KIF18B; KIFC1; KIF4A; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionATP-dependent microtubule motor activity (GO: 1990939)5.81E-131.22E-10−1.9153.71CENPE; KIF18B; KIFC1; KIF4A; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionMicrotubule motor activity (GO: 0003777)2.84E-122.98E-10−2.0654.82CENPE; KIF18B; KIFC1; KIF4A; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionATP-dependent microtubule motor activity, plus-end-directed| (GO: 0008574)7.36E-115.15E-09−2.5960.38CENPE; BLM; KIF18B; KIFC1; KIF4A; KIF14; KIF2C; KIF23; KIF20A
GO Biological ProcessMitotic metaphase plate congression (GO: 0007080)4.10E-102.08E-08−2.7960.28CENPE; KIFC1; CDCA5; KIF14; CDCA8; KIF2C
GO Molecular FunctionProtein-DNA unloading ATPase activity (GO: 0140083)1.11E-094.14E-08−2.5151.79CENPE; BLM; KIF18B; KIFC1; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionATPase activity, uncoupled (GO: 0042624)1.18E-094.14E-08−2.5652.69CENPE; BLM; KIF18B; KIFC1; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionATP-dependent microtubule motor activity, minus-end-directed (GO: 0008569)1.33E-094.14E-08−2.5451.86CENPE; BLM; KIF18B; KIFC1; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionATPase activity, coupled (GO: 0042623)1.49E-094.14E-08−2.6153.06CENPE; BLM; KIF18B; KIFC1; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionATPase activity (GO: 0016887)1.58E-094.14E-08−2.5351.31CENPE; BLM; KIF18B; KIFC1; KIF14; KIF2C; KIF23; KIF20A
GO Molecular FunctionIntracellular ATPase-gated chloride channel activity (GO: 0005260)5.56E-091.30E-07−2.6350.06CENPE; BLM; KIF18B; KIFC1; KIF14; KIF2C; KIF23; KIF20A
GO Biological ProcessMitotic spindle midzone assembly (GO: 0051256)6.60E-092.68E-07−2.139.62RACGAP1; KIF4A; KIF23; AURKB
GO Biological ProcessMetaphase plate congression (GO: 0051310)1.04E-083.51E-07−2.4444.81CENPE; CENPF; KIF2C; FAM83D
GO Biological ProcessAnaphase-promoting complex-dependent catabolic process (GO: 0031145)4.13E-080.000001197−2.5142.73CDC20; PTTG1; UBE2C; BUB1B; AURKB; AURKA
GO Biological ProcessChromosome segregation (GO: 0007059)1.13E-070.000002705−2.3337.27CDT1; CENPE; CENPF; HJURP; SKA1
GO Biological ProcessSpindle organization (GO: 0007051)1.20E-070.000002705−2.1333.93TTK; AUNIP; AURKB; AURKA
GO Biological ProcessProtein ubiquitination involved in ubiquitin-dependent protein catabolic process (GO: 0042787)5.14E-070.00001043−2.8941.83CDC20; PTTG1; UBE2C; BUB1B; AURKB; AURKA
GO Molecular FunctionMicrotubule plus-end binding (GO: 0051010)7.81E-071.64E-05−2.3232.66RACGAP1; KIF14; KIF2C; KIF23; FAM83D; SKA1
Table 4

KEGG enrichment analysis of 49 hub genes.

NameP valueAdjusted P valueZ scoreCombined scoreGenes
Cell cycle6.34E-101.01E-08−5.14108.9CDC20; PTTG1; ESPL1; BUB1B; TTK; MCM6; BUB1; MCM2
Oocyte meiosis0.000013460.0001077−27.81311.96CDC20; PTTG1; ESPL1; BUB1; AURKA
DNA replication0.000093220.0004972−49.47459.08RFC4; MCM6; MCM2
Fanconi anemia pathway0.00031390.001256−42.68344.27FANCI; BLM; RAD51
Human T-cell leukemia virus 1 infection0.0020150.006448−19.4120.41CDC20; PTTG1; ESPL1; BUB1B
Homologous recombination0.0045370.0121−49.65267.88BLM; RAD51
Breast cancer patients were divided into a high-expression group and a low-expression group according to the median expression value of each hub gene, and survival curves were plotted accordingly. The P value of 37 of 49 hub genes was less than 0.05, indicating that the 37 hub genes were statistically significant (Figure 4). The survival curve generally shows a downward trend with the increase of time. On the graph line, the slope is larger, which means lower survival rate. The image of the high-expression group of 49 hub genes was steeper than that of the low-expression group, indicating that the high expression of these hub genes was closely related to the poor prognosis of patients.
Figure 4

Survival curves for patients in different groups. Red lines represent high expression of hub genes, while blue lines represent low expression of hub genes.

Hub gene mutation analysis

The independent data set GSE29044 from the GEO database was used for mutation analysis (Figure 5). All 49 hub genes were shown to be upregulated in the mutation analysis, suggesting that our hub genes are scientific and persuasive, and Supplementary Figure 6 shows that 19 of 49 hub genes were significantly upregulated.
Figure 5

Mutation analysis of 49 genes was based on independent data set GSE29044.

Discussion

Breast cancer is an epithelial malignant tumor of ductal lobules at the end of the mammary gland. It is the most common cancer among women and has a poor prognosis, causing a large number of deaths worldwide every year. The research activity on breast cancer is in direct proportion to its harm to human beings, but quantity does not mean quality; the pathogenesis of breast cancer has not been fully elucidated, and the genetic standard of breast cancer staging is not perfect. Although some studies have used the WGCNA method to explore molecular markers related to the pathogenesis, diagnosis, treatment, and prognosis of breast cancer, the present study may contribute to the establishment of a more complete set of molecular markers for pathological staging of breast cancer. This may lead to better treatment regimens for different patients and better prognostic estimates. In our study, we used 367 samples without stage IV to construct the co-expression network. After dynamic tree shearing, we identified 18 modules, among which the brown module showed the strongest correlation with pathological staging. The Z summery score and medianRank indicated that the brown module has good stability. Therefore, the brown module as a candidate module continues to identify candidate biomarkers. Finally, we identified the 49 hub genes that are candidate biomarkers for breast cancer pathological stage, and 21 hub genes that were not inquired about in Junior Doc. The PPI chart of 49 hub genes showed the ideal degree of connectivity, and it could be intuitively seen that the hub genes were interacting with each other. Both survival analysis and mutation analysis yielded satisfactory results: high expression of 49 hub genes was closely related to the poor prognosis of patients, and all 49 hub genes were shown to be upregulated in the mutation analysis. Moreover, to better illustrate how hub genes works, our study also carried out GO functional annotation and KEGG enrichment analysis for 49 hub genes. GO functional annotation of hub genes were suggested to focus on sister chromatid cohesion, mitotic cell cycle, spindle midzone, and microtubule motor activity. Similarly, hub genes identified by KEGG were enriched in the cell cycle and DNA replication. These hub pathways are basically all involved in cell division. The sister chromatid cohesion pathway gene members are CDC20, CENPE, CENPF, ERCC6L, CENPI, BUB1B, CDCA8, KIF2C, BUB1, AURKB, and SKA1. The sister chromatid cohesion pathway is a hub of mitotic chromosomes separation; this process is mediated by the cohesive element protein complexes. It has been reported in colorectal cancer [16], bladder cancer [17], head and neck squamous cell carcinoma [18], and other cancers. Previous studies on breast cancer found that sister chromatid cohesion can inhibit breast cancer cells and induce their apoptosis and autophagy [19], and the survival rate of breast cancer patients with defective sister chromatid cohesion expression is lower than those with higher sister chromatid cohesion expression [20]. In addition, studies have shown that cancer cells that recognize mutations in sister chromatid cohesion may suggest new therapeutic opportunities [21]. Cell cycle refers to the time needed for one cell to divide one time, and its members have CDC20, PTTG1, ESPL1, BUB1B, TTK, MCM6, BUB1, and MCM2. Many studies have shown that the stagnation of the cell cycle is a target for the treatment of breast cancer [22]. In terms of individual hub genes, KIF4A, MCM10, and TPX2 were also listed as the hub genes associated with poor prognosis of breast cancer in other studies that also used the WGCNA method to identify the pathological process of breast cancer [23]. In another study looking at biomarkers of prognosis in invasive breast cancer [24], only MELK in the 6 hub genes was the same as our hub gene, suggesting that our study could be complementary. A study using grade 1, 2, and 3 differentially expressed genes of cancer to construct a hierarchical specific molecular interaction network indicated that KIF2C and UBE2C are potential biomarkers for breast cancer diagnosis and prognosis [25]. The present study offers more new insights than the hub genes provided by the latest research on genetic markers for breast cancer [26]. Most of the hub genes associated with breast cancer are well understood. AURKA and AURKB also are leading predictors of poor prognosis [27]. AURKA has the highest degree of connectivity in the PPI network. Our research on this gene is relatively mature, and it has been reported in many studies that this gene is closely related to breast cancer. AURKA inhibitors have long been important drugs in the clinical treatment of breast cancer. The latest research shows that AURKA inhibitors combined with other inhibitors provide a new approach for the treatment of breast cancer [28]. BUB1B causes higher chromosomal instability in breast cancer cells [29], and BUB1 is associated with cancer stem cells [30]. CDT1, CDC20 CENPE CENPF, and CENPI expression in breast cancer cells are on the increase, and are significantly associated with shorter survival [31,32]. KIF14, KIF20A, KIF2C, KIF4A, and KIFC1 belonging to the kinesin family and have been proven to be potential biomarkers of breast cancer prognosis [33,34], but there have been few studies on KIF18B and KIF23. HJURP is a histone chaperone, a prognostic factor for disease-free survival and overall survival in breast cancer patients, and a predictive marker for radiotherapy sensitivity [35]. Among the hub genes in the MCM series, MCM2 and MCM10 have been extensively studied in the pathological process of breast cancer, but MCM6 has received relatively little attention. However, poor prognosis due to overexpression of MCM6 has been reported in lung cancer [36]. Additionally, the abnormal expression of TPX2 has basically become a universal biomarker for poor prognosis of cancer, which has been reported in gastric cancer, non-small cell lung cancer, liver cancer, and other cancers [37,38]. CASC5, also known as KNL1, is an important gene involved in chromosome separation and is expressed in various cancer cells. Inhibition of this gene expression can induce cell cycle arrest and inhibit cell proliferation and migration. KNL1 and BUB1 have similar effects and both function by activating the kinetochore-bound Mad1-Mad2 [39]. However, CASC5 has not been adequately investigated in relation to breast cancer, nor has BUB1. The protein encoded by the CKAP2L gene plays an important role in neuroprogenitor cell division, and mutations in this gene are associated with spindle tissue defects [40]. CKAP2L is reported to be the main cause of Filippi syndrome [40]. There have been few studies on the relationship between CKAP2L and cancer, and few studies have been reported so far: the deletion of this gene is associated with oral squamous cell carcinoma, and the latest study shows that the upregulated expression of this gene in lung adenocarcinoma may be closely correlated with the poor prognosis of patients. However, the exact role of CKAP2L in cancer progression, metastasis, and drug resistance remains unclear. In particular, research on the relationship between CKAP2L and breast cancer is scant. In addition, understanding of the molecular basis of CASC5, CKAP2L, FAM83D, KIF18B, KIF23, SKA1, GINS1, CDCA5, and MCM6 in breast cancer is poor, so in order to better reveal the staging of breast cancer markers, it is necessary to study the hub genes. Our understanding of certain genes is still incomplete, and our co-expression networks might provide new clues to the complex regulation of these different molecules. However, compared with other tumor databases with more tumor database samples, the data set samples that this study used are relatively small; there may be bias, and many related studies have been published. Furthermore, although it found that some new genes may be related to the pathological process of breast cancer, these new genes may not provide accurate information about the actual biological characteristics of the tumor.

Conclusions

We established a co-expression network to identify the hub genes related to the pathological staging of breast cancer, identifying 49 hub genes that were associated with the pathologic stage of breast cancer, 19 of which were significantly upregulated in breast cancer. Our results may provide new insights into biomarkers for breast cancer, but more research is needed to validate these findings Analysis of network topology for various soft-theholding powers. The left panel showed the scale-free fit index, signed R2 (y-axis) and the soft threshold power (x-axis). Clustering dendrogram of genes and modules identfied by weighted gene co-expression network analysis based on a dissimilarity measure (1-TOM). Scatter diagram for module membership vs. gene significance of stage (breast cancer or non-breast cancer) in brown module. The protein-protein network of the hub genes in brown module. Within the network, node sizes and color depth are proportional to their connectivity. The P value of 49 hub genes for gene validation. 19 hub genes were significantly up-egulated in breast cancer that indicates the high expression of those genes is related to breast cancer.
  40 in total

1.  Grade-specific diagnostic and prognostic biomarkers in breast cancer.

Authors:  V S P K Sankara Aditya Jayanthi; Asim Bikas Das; Urmila Saxena
Journal:  Genomics       Date:  2019-03-07       Impact factor: 5.736

2.  Balsamin induces apoptosis in breast cancer cells via DNA fragmentation and cell cycle arrest.

Authors:  Parminder K Ajji; Marley J Binder; Ken Walder; Munish Puri
Journal:  Mol Cell Biochem       Date:  2017-04-04       Impact factor: 3.396

Review 3.  Biomarkers in Breast Cancer: Where Are We and Where Are We Going?

Authors:  Michael J Duffy; Siun Walsh; Enda W McDermott; John Crown
Journal:  Adv Clin Chem       Date:  2015-06-23       Impact factor: 5.394

4.  Understanding survival analysis: Kaplan-Meier estimate.

Authors:  Manish Kumar Goel; Pardeep Khanna; Jugal Kishore
Journal:  Int J Ayurveda Res       Date:  2010-10

5.  Identification of key pathways and hub genes in basal-like breast cancer using bioinformatics analysis.

Authors:  Kaidi Yang; Jian Gao; Mao Luo
Journal:  Onco Targets Ther       Date:  2019-02-18       Impact factor: 4.147

6.  Television Viewing Time and Breast Cancer Incidence for Japanese Premenopausal and Postmenopausal Women: The JACC Study.

Authors:  Jinhong Cao; Ehab Salah Eshak; Keyang Liu; Isao Muraki; Renzhe Cui; Hiroyasu Iso; Akiko Tamakoshi
Journal:  Cancer Res Treat       Date:  2019-03-21       Impact factor: 4.679

7.  RNA Interference of IQ Motif Containing GTPase-Activating Protein 3 (IQGAP3) Inhibits Cell Proliferation and Invasion in Breast Carcinoma Cells.

Authors:  Gaowu Hu; Ye Xu; Wenquan Chen; Jiandong Wang; Chunying Zhao; Ming Wang
Journal:  Oncol Res       Date:  2016-10-27       Impact factor: 5.574

8.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.

Authors:  Edward Y Chen; Christopher M Tan; Yan Kou; Qiaonan Duan; Zichen Wang; Gabriela Vaz Meirelles; Neil R Clark; Avi Ma'ayan
Journal:  BMC Bioinformatics       Date:  2013-04-15       Impact factor: 3.169

9.  MCMs expression in lung cancer: implication of prognostic significance.

Authors:  Yi-Zhen Liu; Bo-Shi Wang; Yan-Yi Jiang; Jian Cao; Jia-Jie Hao; Yu Zhang; Xin Xu; Yan Cai; Ming-Rong Wang
Journal:  J Cancer       Date:  2017-10-09       Impact factor: 4.207

10.  Prognostic Genes of Breast Cancer Identified by Gene Co-expression Network Analysis.

Authors:  Jianing Tang; Deguang Kong; Qiuxia Cui; Kun Wang; Dan Zhang; Yan Gong; Gaosong Wu
Journal:  Front Oncol       Date:  2018-09-11       Impact factor: 6.244

View more
  11 in total

1.  Effect of CDCA5 on Proliferation and Metastasis of Triple Negative Breast Cancer Cells under shRNA Interference Technology.

Authors:  Yilin Li; Wei Peng; Yunfang Deng; Jianfu Heng; Yang Yang; Xia Jin; Jigang Li; Ting Li
Journal:  J Oncol       Date:  2022-06-11       Impact factor: 4.501

2.  Bioinformatics identification of hub genes and signaling pathways regulated by intravenous immunoglobulin treatment in acute Kawasaki disease.

Authors:  Hongbiao Huang; Lei Xu; Yueyue Ding; Jie Qin; Chengcheng Huang; Xuan Li; Yunjia Tang; Guanghui Qian; Haitao Lv
Journal:  Exp Ther Med       Date:  2021-05-19       Impact factor: 2.447

3.  Recombination and Pol ζ Rescue Defective DNA Replication upon Impaired CMG Helicase-Pol ε Interaction.

Authors:  Milena Denkiewicz-Kruk; Malgorzata Jedrychowska; Shizuko Endo; Hiroyuki Araki; Piotr Jonczyk; Michal Dmowski; Iwona J Fijalkowska
Journal:  Int J Mol Sci       Date:  2020-12-13       Impact factor: 5.923

4.  Identification of the hub genes RUNX2 and FN1 in gastric cancer.

Authors:  Chao Han; Lei Jin; Xuemei Ma; Qin Hao; Huajun Lin; Zhongtao Zhang
Journal:  Open Med (Wars)       Date:  2020-05-30

5.  Overexpression of SKA3 correlates with poor prognosis in female early breast cancer.

Authors:  Yue Zhong; Zhenjie Zhuang; Peiju Mo; Mandi Lin; Jiaqian Gong; Jiarong Huang; Haiyan Mo; Yuyun Lu; Mei Huang
Journal:  PeerJ       Date:  2021-12-13       Impact factor: 2.984

6.  CKAP2L, as an Independent Risk Factor, Closely Related to the Prognosis of Glioma.

Authors:  Li Zhu; Yanlei Zheng; Ronghua Hu; Chenchen Hu
Journal:  Biomed Res Int       Date:  2021-09-28       Impact factor: 3.411

7.  KIF18B promotes breast cancer cell proliferation, migration and invasion by targeting TRIP13 and activating the Wnt/β-catenin signaling pathway.

Authors:  Lan Liu; Zhaofeng Zhang; Xiulin Xia; Jing Lei
Journal:  Oncol Lett       Date:  2022-02-08       Impact factor: 2.967

8.  Knockdown of CDCA5 suppresses malignant progression of breast cancer cells by regulating PDS5A.

Authors:  Yang Wang; Jian Yao; Yulin Zhu; Xuejiao Zhao; Jing Lv; Fulan Sun
Journal:  Mol Med Rep       Date:  2022-05-04       Impact factor: 3.423

9.  Identification of hub genes in colorectal cancer based on weighted gene co-expression network analysis and clinical data from The Cancer Genome Atlas.

Authors:  Yu Zhang; Jia Luo; Zhe Liu; Xudong Liu; Ying Ma; Bohang Zhang; Yuxuan Chen; Xiaofeng Li; Zhiguo Feng; Ningning Yang; Dayun Feng; Lei Wang; Xinqiang Song
Journal:  Biosci Rep       Date:  2021-07-30       Impact factor: 3.840

10.  Prediction of Disease Genes Based on Stage-Specific Gene Regulatory Networks in Breast Cancer.

Authors:  Linzhuo Fan; Jinhong Hou; Guimin Qin
Journal:  Front Genet       Date:  2021-07-15       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.