Literature DB >> 28970849

Gene Regulatory Network Analysis for Triple-Negative Breast Neoplasms by Using Gene Expression Data.

Hee Chan Jung1, Sung Hwan Kim2, Jeong Hoon Lee3, Ju Han Kim3, Sung Won Han4.   

Abstract

PURPOSE: To better identify the physiology of triple-negative breast neoplasm (TNBN), we analyzed the TNBN gene regulatory network using gene expression data.
METHODS: We collected TNBN gene expression data from The Cancer Genome Atlas to construct a TNBN gene regulatory network using least absolute shrinkage and selection operator regression. In addition, we constructed a triple-positive breast neoplasm (TPBN) network for comparison. Furthermore, survival analysis based on gene expression levels and differentially expressed gene (DEG) analysis were carried out to support and compare the network analysis results, respectively.
RESULTS: The TNBN gene regulatory network, which followed a power-law distribution, had 10,237 vertices and 17,773 edges, with an average vertex-to-vertex distance of 8.6. The genes ZDHHC20 and RAPGEF6 were identified by centrality analysis to be important vertices. However, in the DEG analysis, we could not find meaningful fold changes in ZDHHC20 and RAPGEF6 between the TPBN and TNBN gene expression data. In the multivariate survival analysis, the hazard ratio for ZDHHC20 and RAPGEF6 was 1.677 (1.192-2.357) and 1.676 (1.222-2.299), respectively.
CONCLUSION: Our TNBN gene regulatory network was a scale-free one, which means that the network would be easily destroyed if the hub vertices were attacked. Thus, it is important to identify the hub vertices in the network analysis. In the TNBN gene regulatory network, ZDHHC20 and RAPGEF6 were found to be oncogenes. Further study of these genes could help to reveal a novel method for treating TNBN in the future.

Entities:  

Keywords:  Genes; Oncogenes; Triple negative breast neoplasms

Year:  2017        PMID: 28970849      PMCID: PMC5620438          DOI: 10.4048/jbc.2017.20.3.240

Source DB:  PubMed          Journal:  J Breast Cancer        ISSN: 1738-6756            Impact factor:   3.588


INTRODUCTION

Breast cancer is a serious disease among women and has become increasingly prevalent worldwide [1]. Triple-negative breast neoplasm (TNBN) account for 15% to 20% of breast cancers, and is intractable to treatment owing to its poor prognosis and high recurrence rate [23]. Over the years, a great deal of effort has been expended to enhance the efficacy of TNBN treatments using the angiogenesis inhibitors bevacizumab and paclitaxel; however, this remains only in the developmental stage [4]. Therefore, it is very important to better understand the physiology of TNBN. There are many methods available to identify and understand the physiology of cancers from the gene viewpoint, such as differentially expressed gene (DEG) analysis, and gene clustering and classification. However, these methods have limitations in identifying gene-gene interactions and connections. In addition, gene clustering and classification do not detect important genes in formed clusters. Given these limitations, we attempted to construct a TNBN gene regulatory network using gene expression data. Previously, de Matos Simoes and Emmert-Streib [5] proved the utility of gene expression data for constructing a gene regulatory network of breast cancer via the BC3Net method, and found significant pathways enriched for the cell cycle and immune response [6]. However, in contrast to their method, we used conditional independence graphs with least absolute shrinkage and selection operator (LASSO) regression to exclude falsely detected gene regulatory networks. By doing this, we created a more precise network to identify gene-gene interactions and hub genes. In addition, we used triple-positive breast neoplasm (TPBN) gene expression data to compare with the TNBN gene regulatory network data, although TPBN is not a definitive entity of breast cancer.

METHODS

We retrieved RNA-Seq expression data for breast cancer from The Cancer Genome Atlas (TCGA) [7]. By definition, in terms of immunohistochemistry, TNBN is negative for the estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2), whereas TPBN is positive for all three receptors [7].

Data characteristics

Of the 1,088 patients with breast cancer logged in TCGA, 115 (10.6%) had TNBN and 97 (8.9%) had TPBN (Table 1). The mean age of the patients with TNBN was 54.73 years (range, 42.94–66.52 years), being statistically significantly lower than that of the patients with TPBN, which was 59.98 years (range, 45.88–74.08 years) (p=0.004, Student t-test). Of the combined TNBN and TPBN groups, 190 patients were alive and 22 patients were deceased at the time of study. The proportions of pathologic tumor stages were similar in both groups, with stage II being the most frequent followed by stages III and I. Most patients did not receive neoadjuvant chemotherapy, but they also did not have a surgical margin status. Differences in neoadjuvant chemotherapy and surgical margin status were not statistically significant between the two groups (p>0.05, generalized Fisher exact test). RNA-Seq V2 expression levels (log2-transformed and normalized RNASeq by expectation–maximization values) were retrieved from the TCGA portal.
Table 1

Demographics of the triple-positive and triple-negative breast neoplasm patients

CharacteristicsTriple-negative (n = 115) No. (%)Triple-positive (n = 97) No. (%)All (n = 212) No. (%)
Stage
 I19 (16.5)9 (9.3)28 (13.2)
 II72 (62.6)56 (57.8)128 (60.4)
 III19 (16.5)30 (30.9)49 (23.1)
 IV2 (1.8)1 (1.0)3 (1.4)
 NA3 (2.6)1 (1.0)4 (1.9)
Age (yr)*54.73 (42.94–66.52)59.98 (45.88–74.08)57.09 (43.98–70.20)
Status
 Alive103 (89.6)87 (89.6)190 (89.6)
 Dead12 (10.4)10 (10.4)22 (10.4)
NAC
 Yes00004 (4.1)4 (1.9)
 No114 (99.1)93 (95.9)207 (97.6)
 NA1 (0.9)00001 (0.5)
Margin status
 Positive3 (2.6)4 (4.2)7 (3.3)
 Negative102 (88.7)81 (83.5)183 (86.3)
 Close4 (3.5)1 (1.0)5 (2.4)
 NA6 (5.2)11 (11.3)17 (8.0)
Race
 White67 (58.3)63 (64.9)130 (61.3)
 Black32 (27.9)8 (8.3)40 (18.9)
 Asian8 (6.9)5 (5.2)13 (6.1)
 NA8 (6.9)21 (21.6)29 (13.7)

NAC=neoadjuvant chemotherapy; NA=not available.

*Median (range).

Statistical analyses

The statistical analyses involved a two-stage analytical scheme: (1) regression-based network inference and (2) post hoc analysis. In the first stage, we estimated probabilistic neighbors (typically called a conditional independence graph) on the basis of gene expression in the triple-positive and triple-negative patients, respectively. We used LASSO regression to estimate the probabilistic neighbors, applying the optimal penalty parameter to control the probability of including falsely estimated neighbors [89]. The LASSO-based approach estimates a network by finding probabilistic neighbors around each node, and is computationally efficient and requires only a small amount of memory in computing systems. Thus, this approach is very applicable to such high-dimensional data. The estimated neighbors indicate functional interactions between genes. After that, we calculated the degree of each gene (called the hub gene) and the number of neighboring genes around the hub genes. Subsequently, we sorted the hub genes by degree from large to small, and performed a post hoc analysis to understand their biological functions. In the second stage, both univariate and multivariate Cox proportional hazard models were used to assess survival rates related to hubgene expression. The age, pathologic tumor stage, and ER, PR, and HER2 status were used as covariates to adjust the univariate factor in the survival model. In addition, DEGs between TNBN and TPBN were selected by performing empirical Bayes moderated t-statistics on the log2-transformed RNA-Seq data, with cutoff thresholds of Bonferroni corrected p-values of <0.05 and log fold changes of >|1|, using the Bioconductor ‘limma’ R-package (http://bioconductor.org/packages/release/bioc/html/limma.html) [1011]. For the network statistical analyses, NodeXL version 1.0.1.361 (The Social Media Research Foundation, Belmont, USA) was used.

RESULTS

In the TNBN gene regulatory network, a total of 10,237 vertices and 17,773 edges were observed. The graph density was 0.0003, the maximum vertex-to-vertex distance was 28, and the average vertex-to-vertex distance was 8.6, which means that if one were to go through eight vertices, all would be connected. Statistical results for the TPBN gene regulatory network were similar (Table 2).
Table 2

Gene regulatory network statistics of triple-negative and triple-positive breast neoplasms

Network statisticsTNBNTPBN
Vertices10,2378,930
Total edges17,77315,223
Maximum geodesic distance (diameter)2829
Average geodesic distance8.6351098.649705
Graph density0.0003392250.000381835

TNBN=triple-negative breast neoplasm; TPBN=triple-positive breast neoplasm.

In the network centrality analysis, the TNBN gene regulatory network revealed the genes RAPGEF6, GTF2A1, and ASXL2 to have the highest hub vertex degree with 38 edges (Tables 3, 4, 5). In addition, we conducted a hub vertex analysis with the betweenness centrality and eigenvector centrality. ZDHHC20 had the highest value (2593718.407) for the betweenness centrality, whereas ASXL2 had the highest value (0.019) for the eigenvector centrality.
Table 3

Degree centrality results for triple-negative and triple-positive breast neoplasms

TNBNTPBN
GeneValueGeneValue
ASXL238CLOCK39
GTF2A138REST32
RAPGEF638ATP5D30
ZDHHC2035TGFBR228
CCNT133ASXL227
PDGFRB32STRN26
REST31RBM2725
TAOK131RIF125
RIF130CCNT125
ATP5D30ZEB225

TNBN=triple-negative breast neoplasm; TPBN=triple-positive breast neoplasm.

Table 4

Betweenness centrality results for triple-negative and triple-positive breast neoplasms

TNBNTPBN
GeneValueGeneValue
ZDHHC202593718.407CLOCK1256280.673
RAPGEF62147541.505GYPC938225.682
ZNF1921835667.961HIC1922622.728
GTF2A11468223.374REST890485.116
RIF11437912.482ATP5D812258.220
ATP5D1369994.673FAM108A1794508.449
ASXL21366586.056GTF2A1773790.744
REST1339309.745TRAPPC5772846.398
CCNT11198033.135ASXL2759382.681
GMCL11040959.614CCDC12686844.284

TNBN=triple-negative breast neoplasm; TPBN=triple-positive breast neoplasm.

Table 5

Eigenvector centrality results for triple-negative and triple-positive breast neoplasms

TNBNTPBN
GeneValueGeneValue
ASXL20.019CLOCK0.016
GTF2A10.017CCNT10.013
REST0.016REST0.012
CCNT10.014ASXL20.010
ZDHHC200.014STRN0.009
UHMK10.013NCOA20.009
RAPGEF60.012UHMK10.008
NCOA20.010EXOC6B0.007
LMTK20.009RC3H20.006
TAOK10.009SHPRH0.006

TNBN=triple-negative breast neoplasm; TPBN=triple-positive breast neoplasm.

For the network clustering analysis, we used the Clauset-Newman-Moore algorithm to divide the network into groups. In the TNBN gene regulatory network, there were 352 groups showing a modularity value of 0.825, and the largest group contained 1,241 vertices and 2,421 edges. On the other hand, there were 456 groups showing a modularity value of 0.820 in the TPBN gene regulatory network, with the largest group containing 1,153 vertices and 2,377 edges. In the TNBN gene regulatory network, the largest and second largest groups were connected most frequently (Figure 1), whereas the second and third largest groups were connected most frequently in the TPBN gene regulatory network (Figure 2).
Figure 1

Cluster analysis of the triple-negative breast neoplasm gene regulatory network using the Clauset-Newman-Moore algorithm. The largest group (blue) and the second largest group (sky-blue) are connected the most frequently.

Figure 2

Cluster analysis of the triple-positive breast neoplasm gene regulatory network using the Clauset-Newman-Moore algorithm. The second largest group (red) and the third largest group (green) are connected the most frequently.

It is known that gene regulatory networks in nature generally satisfy the power law. The distributions of vertex degrees were expected to follow the power law precisely, as defined by P(k) ~ k, where r is an exponential factor. Using log-transformed values, we performed a regression analysis on the observed vertex degree and density values of the two gene regulatory networks (Figure 3, Supplementary Table 1, available online). The slopes in Supplementary Table 1 are the estimated -r values. We noted that the results adequately satisfied the power-law distribution.
Figure 3

Regression analysis of the observed vertex degree and density values. (A) Regression analysis of degree exist in TN has slope -2.823, adjusted R2 0.882, and p<0.001 which satisfy the power-law distribution. (B) Regression analysis of degree exist in TP has slope -2.727, adjusted R2 0.897, and p<0.001 which satisfy the power-law distribution.

Degree exist in TN=triple-negative breast neoplasm group; Degree exist in TP=triple-positive breast neoplasm group.

In addition, to confirm the subtype-specific clinical relevance, the subset of patients was selected by the status of ER, PR, and HER2 for analysis by Cox proportional hazard regression. Survival analyses were performed on three genes (CLOCK, RAPGEF6, and ZDHHC20), and the hazard ratio and p-value are shown in Supplementary Table 2 (available online). The analysis on the three genes revealed that the HER2-negative, ER-positive, and PR-positive groups had meaningful hazard ratios in both univariate and multivariate analyses. In the multivariate survival analysis, the survival rate tended to decrease with a higher expression of CLOCK, ZDHHC20, and RAPGEF6, with hazard ratios of 1.76, 1.54, and 1.51, respectively (Table 6).
Table 6

Cox regression based on clinical variables and hub genes

Clinical variableUnivariate Clinical variableMultivariate
HR (95% CI)p-valueHR (95% CI)p-value
Age at diagnosis (yr)1.250 (1.081–1.445)0.0021.423 (1.137–1.782)0.002
Stage
 IReferenceReference
 II1.516 (0.800–2.871)0.2011.225 (0.523–2.868)0.640
 III2.645 (1.349–5.187)0.0042.792 (1.115–6.987)0.028
 IV4.737 (1.990–11.289)< 0.0015.431 (1.443–20.433)0.012
ER
 PositiveReferenceReference
 Negative1.582 (1.034–2.420)0.0341.205 (0.437–3.323)0.718
PR
 PositiveReferenceReference
 Negative1.674 (1.119–2.505)0.0121.831(0.687–4.879)0.226
HER2
 PositiveReferenceReference
 Negative0.313 (0.170–0.576)< 0.0010.465 (0.240–0.898)0.022
CLOCK1.523 (1.238–1.874)< 0.0011.779 (1.297–2.440)< 0.001
RAPGEF61.184 (1.000–1.402)0.0501.508 (1.108–2.053)0.009
ZDHHC201.148 (0.974–1.352)0.1001.565 (1.179–2.079)0.002

HR=hazard ratio; CI=confidence interval; ER=estrogen receptor; PR=progesterone receptor; HER2=human epidermal growth factor receptor 2.

DISCUSSION

In network analysis, the most important feature is the hub vertex distribution. The TNBN and TPBN gene regulatory networks both showed a scale-free characteristic. This means that, unlike a random network, the TNBN gene regulatory network could be easily destroyed if the hub vertices were attacked. Among the TNBN gene regulatory network hub vertices, the most interesting genes were RAPGEF6 and ZDHHC20. Because these genes are cancer related, they were consistently observed in TNBNs only in the centrality analysis. Draper and Smith [12] have reported that ZDHHC20 was associated with cellular transformation and cell proliferation, but to the best of our knowledge, its relationship to breast cancer has thus far not been fully elucidated. In this situation, ZDHHC20 may be a targetable hub vertex in TNBNs. In addition, RAPGEF6 is known to convert GDP into GTP in the Ras-related proteins Rap1 and Rap2, which are cell-junction related proteins [13]. Activated Rap1 interacts with JamA, Bag3, Afadin, Riam, and RapL to regulate cadherin and integrin, which are connected to the cell junction and extracellular matrix [14]. Thus, if we want to identify changes in cell-to-cell interactions in TNBNs, it is essential to study RAPGEF6. CLOCK was at the top of all three centrality analyses of TPBN. Interestingly, in TNBNs, CLOCK had 5 degrees, with an eigenvector and betweenness centrality values of 0.003 and 215085.373, respectively. These results show that even though TNBN may not be affected by hormonal dysregulation, its oncogenic property may affect its genesis. The result that higher expression of CLOCK, ZDHHC20 and RAPGEG6 related to lower survival rate also support the importance of the CLOCK, ZDHHC20, and RAPGEF6 genes in each network group. In addition, the ASXL2, CCNT1, and NCOA2 genes were also frequently observed in the centrality analysis in both groups. These genes are well-known in tumorigenesis [151617]. Thus, by conducting a thorough network analysis, we can find not only well-known genes but also genes that are not as well known in cancers. In fact, in the DEG analysis, except for RAPGEF6, we could find no other genes showing a meaningful p-value (Bonferroni) between the two groups (Supplementary Table 3, available online). In addition, we compared these genes using 100 normal and 1,084 cancer samples. The genes CCNT1 and ASXL2 showed a p-value of less than 0.001 (Bonferroni), whereas the other genes showed a p-value of 1. Therefore, the DEG analysis using cancer and normal samples suggests that the important genes found from the network analysis cannot be found in the DEG analysis (Supplementary Table 4, available online). Through network analysis, we have attempted to understand the physiology of TNBNs. The TNBN and TPBN gene regulatory networks showed similar network statistics, with both having similar network densities, diameters, average vertex-to-vertex distance values, and scale-free network characteristics. However, the TNBN gene regulatory network was less clustered than the TPBN gene regulatory network, albeit showing a similar modularity. In addition, the hub vertices were different in both groups. Although we could not conduct specific analyses on each cluster in the TNBN gene regulatory network, we were able to find some oncogenes through the centrality analyses.
  14 in total

Review 1.  Triple-negative breast cancer.

Authors:  William D Foulkes; Ian E Smith; Jorge S Reis-Filho
Journal:  N Engl J Med       Date:  2010-11-11       Impact factor: 91.245

2.  Identification and characterization of RA-GEF-2, a Rap guanine nucleotide exchange factor that serves as a downstream target of M-Ras.

Authors:  X Gao; T Satoh; Y Liao; C Song; C D Hu; K Kariya Ki; T Kataoka
Journal:  J Biol Chem       Date:  2001-08-27       Impact factor: 5.157

3.  Estimation of Directed Acyclic Graphs Through Two-stage Adaptive Lasso for Gene Network Inference.

Authors:  Sung Won Han; Gong Chen; Myun-Seok Cheon; Hua Zhong
Journal:  J Am Stat Assoc       Date:  2016-10-18       Impact factor: 5.033

4.  Cyclin T1 overexpression induces malignant transformation and tumor growth.

Authors:  Cristian Moiola; Paola De Luca; Kevin Gardner; Elba Vazquez; Adriana De Siervi
Journal:  Cell Cycle       Date:  2010-08-01       Impact factor: 4.534

5.  Subtyping of breast cancer by immunohistochemistry to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies.

Authors:  Fiona M Blows; Kristy E Driver; Marjanka K Schmidt; Annegien Broeks; Flora E van Leeuwen; Jelle Wesseling; Maggie C Cheang; Karen Gelmon; Torsten O Nielsen; Carl Blomqvist; Päivi Heikkilä; Tuomas Heikkinen; Heli Nevanlinna; Lars A Akslen; Louis R Bégin; William D Foulkes; Fergus J Couch; Xianshu Wang; Vicky Cafourek; Janet E Olson; Laura Baglietto; Graham G Giles; Gianluca Severi; Catriona A McLean; Melissa C Southey; Emad Rakha; Andrew R Green; Ian O Ellis; Mark E Sherman; Jolanta Lissowska; William F Anderson; Angela Cox; Simon S Cross; Malcolm W R Reed; Elena Provenzano; Sarah-Jane Dawson; Alison M Dunning; Manjeet Humphreys; Douglas F Easton; Montserrat García-Closas; Carlos Caldas; Paul D Pharoah; David Huntsman
Journal:  PLoS Med       Date:  2010-05-25       Impact factor: 11.069

6.  Paclitaxel plus bevacizumab versus paclitaxel alone for metastatic breast cancer.

Authors:  Kathy Miller; Molin Wang; Julie Gralow; Maura Dickler; Melody Cobleigh; Edith A Perez; Tamara Shenkier; David Cella; Nancy E Davidson
Journal:  N Engl J Med       Date:  2007-12-27       Impact factor: 91.245

7.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

8.  ASXL2 promotes proliferation of breast cancer cells by linking ERα to histone methylation.

Authors:  U-H Park; M-R Kang; E-J Kim; Y-S Kwon; W Hur; S K Yoon; B-J Song; J H Park; J-T Hwang; J-C Jeong; S-J Um
Journal:  Oncogene       Date:  2015-12-07       Impact factor: 9.867

9.  Bagging statistical network inference from large-scale gene expression data.

Authors:  Ricardo de Matos Simoes; Frank Emmert-Streib
Journal:  PLoS One       Date:  2012-03-30       Impact factor: 3.240

10.  The gene regulatory network for breast cancer: integrated regulatory landscape of cancer hallmarks.

Authors:  Frank Emmert-Streib; Ricardo de Matos Simoes; Paul Mullan; Benjamin Haibe-Kains; Matthias Dehmer
Journal:  Front Genet       Date:  2014-02-03       Impact factor: 4.599

View more
  4 in total

1.  Discovering biomarkers for hormone-dependent tumors: in silico study on signaling pathways implicated in cell cycle and cytoskeleton regulation.

Authors:  Klaudia Waszczykowska; Karolina Prażanowska; Żaneta Kałuzińska; Damian Kołat; Elżbieta Płuciennik
Journal:  Mol Genet Genomics       Date:  2022-05-09       Impact factor: 3.291

Review 2.  The Traditional Chinese Medicine Kangai Injection as an Adjuvant Method in Combination with Chemotherapy for the Treatment of Breast Cancer in Chinese Patients: A Meta-Analysis.

Authors:  Jing-Xian Xue; Zhi-Yuan Zhu; Wei-He Bian; Chang Yao
Journal:  Evid Based Complement Alternat Med       Date:  2018-04-18       Impact factor: 2.629

3.  Modified Immune Evolutionary Algorithm for Medical Data Clustering and Feature Extraction under Cloud Computing Environment.

Authors:  Jing Yu; Hang Li; Desheng Liu
Journal:  J Healthc Eng       Date:  2020-01-20       Impact factor: 2.682

4.  Condition-adaptive fused graphical lasso (CFGL): An adaptive procedure for inferring condition-specific gene co-expression network.

Authors:  Yafei Lyu; Lingzhou Xue; Feipeng Zhang; Hillary Koch; Laura Saba; Katerina Kechris; Qunhua Li
Journal:  PLoS Comput Biol       Date:  2018-09-21       Impact factor: 4.475

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.