Anastasia Kougioumtzi1, Panayiotis Tsaparas2, Angeliki Magklara3. 1. Department of Biological Applications and Technologies, University of Ioannina, Ioannina, Greece. 2. Department of Computer Science and Engineering, University of Ioannina, Ioannina, Greece. 3. Laboratory of Clinical Chemistry, School of Medicine, University of Ioannina, Ioannina, Greece; Foundation of Research and Technology-Hellas, Institute of Molecular Biology & Biotechnology, Department of Biomedical Research, Ioannina, Greece.
Abstract
Despite the pleiotropic effects of the progesterone receptor in breast cancer, the molecular mechanisms in play remain largely unknown. To gain a global view of the PR-orchestrated networks, we used next-generation sequencing to determine the progestin-regulated transcriptome in T47D breast cancer cells. We identify a large number of PR target genes involved in critical cellular programs, such as regulation of transcription, apoptosis, cell motion and angiogenesis. Integration of the transcriptomic data with the PR-binding profiling of hormonally treated cells identifies numerous components of the small-GTPases signaling pathways as direct PR targets. Progestin-induced deregulation of the small GTPases may contribute to the PR's role in mammary tumorigenesis. Transcript expression analysis reveals significant expression changes of specific transcript variants in response to the extracellular hormonal stimulus. Using the NET1 gene as an example, we show that the PR can dictate alternative promoter usage leading to the upregulation of an isoform that may play a role in metastatic breast cancer. Future studies should aim to characterize these selectively regulated variants and evaluate their clinical utility in prognosis and targeted therapy of hormonally responsive breast tumors.
Despite the pleiotropic effects of the progesterone receptor in breast cancer, the molecular mechanisms in play remain largely unknown. To gain a global view of the PR-orchestrated networks, we used next-generation sequencing to determine the progestin-regulated transcriptome in T47Dbreast cancer cells. We identify a large number of PR target genes involved in critical cellular programs, such as regulation of transcription, apoptosis, cell motion and angiogenesis. Integration of the transcriptomic data with the PR-binding profiling of hormonally treated cells identifies numerous components of the small-GTPases signaling pathways as direct PR targets. Progestin-induced deregulation of the small GTPases may contribute to the PR's role in mammary tumorigenesis. Transcript expression analysis reveals significant expression changes of specific transcript variants in response to the extracellular hormonal stimulus. Using the NET1 gene as an example, we show that the PR can dictate alternative promoter usage leading to the upregulation of an isoform that may play a role in metastatic breast cancer. Future studies should aim to characterize these selectively regulated variants and evaluate their clinical utility in prognosis and targeted therapy of hormonally responsive breast tumors.
Progesterone is a steroid hormone that plays a pivotal role in female physiology by co-coordinating diverse aspects of the reproductive system [1] and regulating mammary gland morphogenesis [2]. It exerts its actions through the two isoforms of the progesterone receptor (PR-A and PR-B), which are ligand-activated transcription factors that belong to the nuclear receptor super-family. Deregulation of progesterone signaling is implicated in the development and progression of cancer in the hormone's target tissues [3]. In breast cancer the role of PR is well documented both in vivo and in vitro
[4]. Experiments in PR knock-out mice demonstrated that progestins promote mammary tumor progression and growth [2], [5], [6]. Two large clinical studies in women [7], [8] have also provided supporting evidence for a tumorigenic role of progesterone in the mammary tissue. In vitro studies have confirmed that progestin treatment affects important cellular programs, such as proliferation, apoptosis and differentiation [3], [9], all of which have the potential to lead to a malignant phenotype when deregulated.To develop effective therapeutic schemes against PR signaling in breast cancer, a major requirement is the determination of the full repertoire of progestin-regulated genes in target cells. Gene expression microarray studies have been useful in characterizing transcriptional effects of progestin signaling [10], [11], [12], [13], [14], [15]. However, this approach is lacking due to high levels of noise, relatively low sensitivity and limited number of array probes suggesting that a plethora of PR-regulated genes may still remain undetected.More than 90% of human genes can generate multiple transcript variants, which are, oftentimes, designated with tissue- or developmental- specific functional roles [16]. A growing number of studies have demonstrated the expression of cancer-associated variants participating in specific cellular programs, including apoptosis, cell growth, angiogenesis and cell motility, during tumor initiation and progression [17], [18]. Detection and characterization of such variants can improve our understanding of the molecular mechanisms in play; it can also have significant impact in the clinic, since they have emerged as a promising tool for the diagnosis and management of the disease [19], [20]. Studies using exon-specific microarrays have identified estrogen-regulated transcript variants in breast cancer cell lines [21], [22]. However, it is currently unknown to what extent progestin-regulated transcript variants contribute to the expression profile of breast cancer cells. The gene expression microarrays studies described above could not discriminate between variants and the reports of up- or down-regulation of mRNA expression levels are confounded by the effects of mixtures of these transcripts [19].To address the above issues, we employed paired-end, next-generation sequencing (NGS) to interrogate the transcriptome of vehicle- and progestin- treated T47Dbreast cancer cells in an unbiased way. We identified hundreds of PR regulated genes that participate in important cellular processes, including apoptosis and transcription as reported before [14], but also angiogenesis and cell migration. More importantly, we identified a novel group of PR targets that are involved in small-GTPases signaling. By employing ChIP-seq experiments for the PR, we showed that many of these genes were under direct progestin transcriptional regulation via the receptor's binding in their promoters or distal enhancer elements. Small-GTPases signaling pathways are widely implicated in normal physiology and disease [23]. According to our data some of them may be subject to PR-regulation and may be mediating the receptor's effects in breast cancer cells. On the transcript level we find that the receptor can dictate alternative promoter use decisions leading to significant expression changes of specific transcript variants as a response to the external hormonal stimulus. Transcript variants often encode for unique protein isoforms with different, even antagonizing, functions [24]; consequently, expression data on the transcript level are necessary to paint an accurate picture of the PR-regulated proteome.In overall, our findings provide new insights into the molecular mechanisms of PR signaling and the progestin-regulated transcriptome of breast cancer cells.
Materials and Methods
Cell culture and Reagents
T47D cells were purchased from ATCC and were grown in RPMI supplemented with 10% fetal calf serum and glutamine (Life Technologies) at 37°C under 5% CO2. Before hormonal treatment (10 nM R5020) cells were plated in RPMI/charcoal-stripped fetal bovine serum either 24 hours (RNA isolation) or 3 days (ChIP assays) before harvesting. The α-PR antibody (catalog no. sc-7208) was purchased from Santa Cruz Biotechnology, Inc. (Santa Cruz, CA), the α-PolII was from Covance (catalog no. MMS-126R) and the antibody against mono/di/trimethyl-Histone H3 (Lys4) was from Millipore (clone AW304 catalog no 04-791).
Reverse Transcription qPCR
Total RNA was extracted from cells using the Trizol reagent (Life Technologies), and 2.5 µg of RNA were reverse transcribed using the RevertAid First-Strand cDNA Synthesis System (ThermoCcientific) according to the manufacturers' instructions. cDNA was amplified by quantitative PCR (qPCR) using the SYBR FAST Universal 2X qPCR Master Mix (KAPA Biosystems). All experiments were performed in at least 3 biological replicates. Statistical analysis was performed by Student's t-test (compared with vehicle treatment). Primer sequences used for RT-qPCR are available upon request.
Chromatin immunoprecipitation assays
ChIPs were performed as described before [25]. Briefly, T47D cells were grown for 3 days in RPMI/charcoal-treated fetal calf serum and then treated with 10 nm R5020 for 1 hr. Fixation with 1% formaldehyde proceeded for 10 min at 37°C and was stopped by the addition of glycine to a final concentration of 0.125 M. Cells were harvested, resuspended in lysis buffer (50 mM Tris-HCl pH 8, 10 mm EDTA, 1% sodium dodecyl sulfate) and fragmented chromatin (500–1000 bp) was generated through sonication (Misonix sonicator). Samples were diluted in ChIP dilution buffer (1.2 mM EDTA; 167 mM NaCl; 16.7 mM Tris-HCl, pH 8; 1.1% Triton X-100; and 0.01% SDS), precleared for 2 hr at 4°C with protein A+G magnetic beads (Life Technologies) and used for the ChIP assays with the addition of 5 µg of antibody. The next day the recovered immunocomplexes were washed with the following buffers: washing buffer I (2 mM EDTA; 20 mM Tris-HCl, pH 8.0; 0.1% SDS; 1% Triton X-100; 150 mM NaCl), washing buffer II (2 mM EDTA; 20 mM Tris-HCl, pH 8.0; 0.1% SDS; 1% Triton X-100; 500 mM NaCl), washing buffer III (1 mM EDTA; 10 mM Tris-HCl, pH 8.0; 1% Nonidet P-40; 1% Deoxycholate; 0.25 mM LiCl) and TE (1 mM EDTA, 10 mM Tris-HCl, pH 8.0). After elution, cross-linking was reversed by an overnight incubation at 65°C, samples were incubated with proteinase K and DNA was extracted with phenol-chloroform and EtOH precipitation. Samples were analyzed by qPCR and their enrichment over input was calculated by the 2−ΔCt method after correcting for the IgG negative controls. All experiments were performed in 2–4 biological replicates. Statistical analysis was performed by Student's t-test. Primer sequences used for ChIP-qPCR are available upon request.
Library preparation for next-generation sequencing experiments
For RNA-sequencing, total RNA was extracted from T47D cells using the Trizol reagent (Life Technologies) and was treated with Turbo DNase I (Ambion) for 30 min at 37°C. poly(A)+ RNA was isolated using oligo(dT)-conjugated magnetic beads (FastTrack MAG mRNA Isolation Kit, Life Technologies). Library preparation was performed with the ScriptSeq kit (Epicentre) according to the manufacturer's instructions. Briefly, poly(A)+ RNA was fragmented at 90°C for 6 min and was, subsequently, subjected to cDNA synthesis. cDNA was tagged at the 3' end, purified using the Agencourt AMPure XP system (BeckmanCoulter), and it was then converted to double-strand cDNA. This product was PCR-amplified for 11 cycles; during this step completion of the addition of the Illumina adaptor sequences and incorporation of an index (ScriptSeq Index PCR Primers, Epicentre) was performed. The PCR product was treated with Exo I for 15 min at 37°C and was purified as described above.For ChIP-sequencing, DNA was immunoprecipitated from ∼20 million cells, grown and treated as described above, and then it was purified and sonicated to ∼400 bp fragments using the Bioruptor (Diagenode). Fragmented DNA was used for library construction using the NEBNext ChIP-seq sample Prep Master MixSet 1 (New England Biolabs) and following the manufacturer's instructions. Briefly, this product underwent end repair, dA-tailing and adaptor ligation using the Illumina specific adaptors. In-between enzymatic steps the samples were purified using Agencourt AMPure XP system (BeckmanCoulter). The libraries were PCR-amplified for 11–14 cycles using Phusion HotStart DNA polymerase.One µl of each library was run on the Bioanalyzer to assess library quantity and quality. Libraries were run on Illumina HiSeq 2000 using 50-bp paired-end sequencing and following standard protocols.
Next-generation sequencing data analysis and differentially expressed gene (DEG) testing
50-bp paired-end sequencing was performed for PR-immunoprecipitated DNA from R5020-treated cells, for genomic input DNA and for mRNA isolated from progestin- and vehicle- treated cells. The FASTQC package (http://www.bioinformatics.babraham.ac.uk /projects/fastqc/) was used to assess quality of reads. When necessary, Trimmomatic v.0.3 [26] was used to trim reads by applying the parameters LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36.For ChIP-seq data analysis, reads were mapped to the human genome (hg19) using Bowtie v.1.1.2 (default parameters, hg19) [27]. SAM tools [28] were used to select for high quality, uniquely mapped, paired-end reads leading to 14,148,482 total reads. MACS [29] was used for peak calling with default parameters (fold enrichment ≥10 and p-value≤10-5) using an equal number of reads from input DNA as a control. For RNA-seq data analysis, after quality filtering, reads were aligned to the UCSC hg19 reference genome using TopHat2 allowing up to two mismatches and discarding reads mapping at multiple locations [30]. This led to the generation of 76,118,132 and 73,174,398 unique paired-end reads for EtOH- and R5020- treated cells respectively. Transcript abundance was quantified using Cufflinks 1.2.1. [31] and the iGenomes Ensembl GTF annotation file as a reference. The normalized expression level of each transcript was measured by FPKM (Fragments Per Kilobase of transcript per Million mapped reads). A threshold of 10 mapped reads was used to define detection at the gene level.For subsequent analyses we considered the information both at the gene and transcript level. Differentially expressed genes and transcripts were called using Cuffdiff 2 that utilizes Student's t-test to determine if two datasets are different from each other [32]. By default, Cuffdiff 2 generates the lowest p-value to be 5×10−5. Genes and transcripts showing an FPKM equal or higher than one at least under one condition (progestin- or vehicle- treated) were retained for further analysis. A threshold of 1.5 was applied on the fold change. Given that the RNA-Seq DEG algorithms generally result in much higher adjusted p-values (0.03∼0.12) than their microarray counterparts (<0.01) [33], and based on the fact that several previously identified PR-regulated genes were listed in our data with p-values up to 0.38, we decided to test higher p-values as cut-offs for the identification of PR-regulated genes. Extensive validation of our data by RT-qPCR led us to finally use a p-value cut-off of 0.15. Using higher p-values led to an increase in the number of false positives. For differentially expressed transcripts the p-value cut-off was ≤0.05. Further manipulation of the data was done with in-house scripts. All datasets have been deposited to GEO (accession number GSE51428).
Gene Ontology analysis
For functional enrichment analysis of the differentially expressed genes (DEGs) the module FatiGO [34] of the Babelomics bioinformatics suite [35] and the DAVID functional annotation tools [36] were used. Both algorithms use Fischer's exact test to check for significant over-representation of Gene Ontology (GO) annotations, but differ to the gene reference background they use. FatiGO compares DEGs with respect to the whole human genome, while DAVID is more conservative and uses genes associated with terms in the corresponding annotation categories as the reference background.
Results and Discussion
Characterization of the RNA-sequencing data
In order to determine the PR-regulated transcriptome in the breast cancer milieu, mRNA was isolated from T47D cells and was subsequently subjected to 50-bp paired-end sequencing. Reads were aligned against the UCSC hg19 reference genome using TopHat 2 [30] and transcript assembly of the aligned reads was performed using Cufflinks [31] and the Refseq database for reference gene annotation (see Materials and Methods).To evaluate our data, we performed an initial analysis on the gene level. The global profiles of gene expression between the two samples were highly correlated with the Pearson correlation coefficient being 0.97 (
). Among the top 100 most highly expressed genes (data not shown), we identified the expected housekeeping ones (e.g. GAPDH, PPIA and TUBA1B) and several genes associated with the healthy (e.g. PRLR), or the neoplastic mammary tissue (e.g. PIP, KRT19 and MUC1) [37], [38], [39]. Also included in this list there were members of the heat shock protein 90 family (HSP90-AA1, -AB1 and -B1), known molecular chaperones of SHRs [40], and several proto-oncogenes, such as AGR2
[41], RAC1
[42] and GNAS
[43]. Notably, we also found very highly expressed the guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1 (GNB2L), calnexin (CANX), calreticulin (CALR) and beta-2 microglobulin (B2M); these are all genes associated with the breast cancer phenotype and they were among the most highly expressed in all breast tumors assayed by RNA-sequencing in a recent study [44].
Figure 1
Identification of PR-regulated genes by RNA-sequencing.
(A) Scatter plot of global gene expression in the EtOH- and R5020- treated T47D cells. In total, 10,997 and 10,930 genes (FPKM>1) are expressed respectively. The high Pearson correlation coefficient (r = 0.966) indicates similar expression profiles between the two samples as expected. (B) Volcano plot (p-value vs. fold change of expression) for all differentially expressed genes (DEGs) between vehicle- and progestin- treated cells. To determine DEGs, a threshold of 1.5-fold change was set and p-value cut-offs were 0.05 and 0.15 for the high (dark blue) and low (light blue) stringency groups respectively. (C) A total of 711 up-regulated and 576 down-regulated genes were identified and classified to two stringency groups, as described above.
Identification of PR-regulated genes by RNA-sequencing.
(A) Scatter plot of global gene expression in the EtOH- and R5020- treated T47D cells. In total, 10,997 and 10,930 genes (FPKM>1) are expressed respectively. The high Pearson correlation coefficient (r = 0.966) indicates similar expression profiles between the two samples as expected. (B) Volcano plot (p-value vs. fold change of expression) for all differentially expressed genes (DEGs) between vehicle- and progestin- treated cells. To determine DEGs, a threshold of 1.5-fold change was set and p-value cut-offs were 0.05 and 0.15 for the high (dark blue) and low (light blue) stringency groups respectively. (C) A total of 711 up-regulated and 576 down-regulated genes were identified and classified to two stringency groups, as described above.In overall, the above findings validate our RNA-sequencing data including the transcript assembly and the calculation of transcript abundance as FPKM values.
Identification and validation of differentially expressed genes
Gene expression microarray experiments are commonly performed after 6–24 hr of progestin treatment and inevitably identify not only primary but also secondary PR targets. The greatly enhanced sensitivity and accuracy delivered by deep-sequencing allowed us to use a shorter treatment period (3 hr) to enrich the differentially regulated genes we identify for primary PR targets. The transcriptomes of the progestin- and vehicle- treated cells were compared using Cuffdiff 2 [32], a differentially expressed genes algorithm. In total, we identified 1287 DEGs and a detailed list of them is provided in . The log2(fold change) of the ratio of EtOH-treated to R5020-treated is given. We set two stringency thresholds to classify all DEGs (see Materials and Methods). The high stringency group included 711 genes and the low stringency group included 576 genes (
). The majority of the PR targets were up-regulated after progestin treatment (896 genes), while 391 were down-regulated (
).To validate our RNA-seq experiments and data analysis, the expression levels of several DEGs from both stringency groups were assayed by RT-qPCR and a side-by-side comparison of these results with the RNA-seq data is shown in
. All genes examined were found to be progestin-regulated in agreement with the RNA-seq data. In
, we present the results from the high stringency genes. We examined a larger number of genes from the low stringency group (
) to ensure that we did not include false positives in our analysis. We observed from our RNA-seq data analysis that most genes in the low stringency group were downregulated (
), thus we selected to examine an equal number of induced and repressed genes from this group (
). For two of these genes, ABCC3 and PYCARD, the degree of downregulation was not accurately measured by RNA-seq, but in overall, changes in expression levels detected by the two techniques were very similar for most genes tested providing additional support for the accuracy of the RNA-seq data.
Figure 2
Differentially expressed genes detected by RNA-seq are validated by RT-qPCR.
T47D cells were treated for 3-qPCR using specific primers for the genes shown. Expression levels were normalized to GAPDH. A side-by-side comparison of the RT-qPCR results with the RNA-seq data is shown. (A) Expression levels of high stringency genes as measured by RT-qPCR and RNA-seq. FOXA1 was the only selected down-regulated gene and its expression pattern was confirmed. (B) Expression levels of low stringency genes as measured by RT-qPCR and RNA-seq. Five downregulated (ABCC3, GLIS2, HEXIM2, KLF3 and PYCARD) and five upregulated genes were selected and confirmed by RT-qPCR. Error bars represent the SEM.
Differentially expressed genes detected by RNA-seq are validated by RT-qPCR.
T47D cells were treated for 3-qPCR using specific primers for the genes shown. Expression levels were normalized to GAPDH. A side-by-side comparison of the RT-qPCR results with the RNA-seq data is shown. (A) Expression levels of high stringency genes as measured by RT-qPCR and RNA-seq. FOXA1 was the only selected down-regulated gene and its expression pattern was confirmed. (B) Expression levels of low stringency genes as measured by RT-qPCR and RNA-seq. Five downregulated (ABCC3, GLIS2, HEXIM2, KLF3 and PYCARD) and five upregulated genes were selected and confirmed by RT-qPCR. Error bars represent the SEM.We also compared our dataset with several other previously published datasets of PR-regulated genes generated by microarray expression experiments in T47D cells [10], [11], [12], [13], [14], [15]. We found that both stringency groups contained previously reported PR-regulated genes (350 out of the 1287) ().The above data, taken together, further confirm the validity of our RNA-seq data and analysis and lead to the identification of hundreds of new PR targets.
Early progestin-induced gene expression changes in breast cancer cells
To gain some insight into the biological significance of our data, we performed gene ontology analysis for all DEGs using the tools FatiGO [34] and DAVID [36]; both algorithms yielded similar results. GO analysis was organized around the three basic "principles" of biological process, molecular function and cellular component (
).
Figure 3
Gene ontology analysis for PR target genes.
(A) Highly enriched biological processes were determined by using FatiGO. All GO annotations related to the same biological process are shown in the same color. Grey bars represent other processes. (B) Highly enriched molecular functions were determined by using FatiGO and the top fifteen are presented here. Results are clustered by function and the percentage of DEGs that were associated with each GO annotation is shown. (C) Eight of the top ten most highly enriched GO annotations for cellular component (as determined by FatiGO) fall into 3 broad categories: plasma membrane, cell junction and Golgi apparatus. Percentage of DEGs that were identified in each category is shown.
Gene ontology analysis for PR target genes.
(A) Highly enriched biological processes were determined by using FatiGO. All GO annotations related to the same biological process are shown in the same color. Grey bars represent other processes. (B) Highly enriched molecular functions were determined by using FatiGO and the top fifteen are presented here. Results are clustered by function and the percentage of DEGs that were associated with each GO annotation is shown. (C) Eight of the top ten most highly enriched GO annotations for cellular component (as determined by FatiGO) fall into 3 broad categories: plasma membrane, cell junction and Golgi apparatus. Percentage of DEGs that were identified in each category is shown.Key biological processes associated with breast cancer initiation and progression, such as regulation of transcription, apoptosis and cell proliferation, were found to be highly enriched among PR targets (
). The DEGs involved in regulation of cell death are listed in . Interestingly, they include an, approximately, equal number of genes that promote or suppress apoptosis () in accordance with the dual role of PR in cell proliferation [45]. Right after progestin treatment T47D cells go through a proliferative phase followed by a second phase of growth inhibition [45]. Our data show that early transcriptional responses are not restricted to cell proliferative genes, but also include induction of genes with apoptotic effects. This suggests that the PR initiates simultaneously a proliferative and an anti-proliferative program and the decision which one will prevail, probably, depends upon the specific cell context. These findings fit with the hypothesis by Lange et al that progesterone acts as a priming agent that induces cellular changes that permit other factors to influence the ultimate proliferative or differentiative state of the cells [46].The most numerous target group (258 out of 1200 genes analyzed by DAVID) is comprised of genes playing a role in regulation of transcription (). It includes members of principal families of transcription factors, such as the GATA, FOX and E2F families that control a wide spectrum of biological functions. Deregulation of the expression of these factors has a crucial role in the development and progression of cancer [47], [48], [49]. Another example is the family of the Krüppel-like factors (KLF) with most of its members being under progestin regulation in T47D cells (). Our data confirm previous findings that several KLFs are PR targets (see ) and they add to this list KLFs 3, 6, and 7 (
). Recent studies have documented KLF family members in the control of cell proliferation, differentiation, and apoptosis in steroid-responsive mammary and uterine endometrial cells [50]; some of them have been implicated in breast cancer progression [50]. It is reasonable to assume that cross-talk between KLF and PR signaling may partly mediate the receptor's effects on these processes. Deregulation of PR signaling may lead to aberrant expression of KLFs and loss of control over these critical events in breast cancer.In accordance with the above analysis, we, also, find that the most highly enriched molecular functions in the DEGs are related to transcription factor activity (blue sectors in
), supporting the notion that the PR initiates a new transcriptional program in the cell as a response to the external stimulus.Other biological pathways that are overrepresented are related to cell motility and angiogenesis (
), which are key steps in cancer cell growth and metastasis. Shortly after progestin treatment the expression status of several genes involved in cell motion () is affected substantiating previous reports of progestin-induced breast cancer cell migration and invasion [51], [52]. Moreover, a significant number of PR targets are components of cell junctions (cell to cell and cell to extracellular matrix) (
) confirming that the receptor plays an important role in regulating cell motion. Our data also offer new insight into the role of PR in angiogenesis by identifying dozens of target genes involved in this process (). VEGFA, a potent angiogenic factor that stimulates breast cancer cell growth in vitro and in vivo
[53] and THBS1, an inhibitor of angiogenesis that promotes tumor progression and metastasis [54], had been reported before as PR-regulated genes [55], [56], [57], [58]. We find that other well-known angiogenic factors, such as ACVR1 and EDN1 are, also, under progestin control (). These data strongly suggest that PR-orchestrated networks are involved in cell motility and vasculature development in breast cancer cells. It is plausible that aberrant PR signaling in breast tumors may be a contributing factor to tumor vascularization and metastasis.Notably, the same functional categories are also enriched when examining the high stringency group alone, confirming the validity of our differential gene analysis (data not shown).
Components of the small-GTPase signaling pathways are direct PR targets
The GO analysis described above revealed that a significant fraction of PR targets is involved in "small-GTPases mediated signal transduction" (
), most likely functioning as "GTPase regulators" (
). This was a novel finding that raised our interest, since deregulation of small-GTPases signaling may play a role in cancer progression [23]. The small GTPases are small G-proteins that can bind and hydrolyze GTP, cycling in this way between an inactive (GDP-bound) and an active (GTP-bound) state. Their activity is regulated by GTPase activating proteins (GAPs) and guanine nucleotide exchange factors (GEFs). They are localized to multiple membrane compartments including the plasma membrane and the Golgi apparatus and this is, probably, the reason behind the significant number of PR targets associated with these cellular components (
).Functional annotation using FaTIGO [34] and DAVID [36] and manual inspection of all DEGs identified 74 genes in total involved in small-GTPases signal transduction. The vast majority of them (59 of 74) are small-GTPases that belong to the Ras superfamily or they are regulators of these enzymes (
). Several of these genes were experimentally validated by RT-qPCR (
). Three hours after progestin treatment all genes assayed were significantly up-regulated, except RERG and VAV3 that were repressed in agreement with the RNA-seq data ().
Table 1
Progestin-regulated small-GTPases, GAPs and GEFs.
Ras family
Small GTPases
GAPs
GEFs
LRRK2
RASA2
PLCE
RASL10B
RASA3
RASGEF1A
RERG
RASA4
RASGRP1
RRAS2
RASAL1
SOS1
SIPA1
Rho family
Small GTPases
GAPs
GEFs
RHOB
ARHGAP17
AKAP13
RHOBTB2
ARHGAP23
ARHGEF2
RHOU
ARHGAP26
ARHGEF26
RHOV
ARHGAP32
ARHGEF37
RND1
ARHGAG40
FGD3
ARHGAP42
FGD4
SRGAP1
ITSN1
SRGAP2
NET1
SRGAP3
PLEKHG6
SRGAP2B
SPATA13
STARD13
VAV3
Arf family
Small GTPases
GAPs
GEFs
ARF6
ASAP2
CYTH1
ARL4C
IQSEC1
ARL4D
PSD4
ARL5B
Rab family
Small GTPases
GAPs
RAB12
TBC1D12
RAB33B
TBC1D20
RAB4A
TBC1D4
RAB9B
Rap family
Small GTPases
GAPs
GEFs
RAP2B
RAP1GAP2
RAPGEF6
RAP2C
Figure 4
Components of small-GTPases signaling pathways are direct PR targets.
(A) T47D cells, treated with R5020 or EtOH for 3 hrs, were used for RNA extraction. RT-qPCR was performed using specific primers for two small GTPases (ARF6, RERG) and several GEFs (AKAP13, FGD4, NET1, PSD4, VAV3) and GAPs (ASAP2, ARHGAP17, ARHGAP42, RAP1GAP2). Expression levels were normalized to GAPDH. Error bars indicate the SEM. (p-value<0.005, single asterisk indicates p-value<0.5 and double asterisk p-value<0.05) (B) Visualization of the ChIP-seq data on the UCSC browser. PR binding sites in an intron of the NET1 gene and in distal intragenic regions of FGD4 and AKAP13 are depicted by the black blocks. (C) ChIP-qPCR experiments show increased PR binding in the three sites shown in (B) after 1 hr of progestin treatment (single asterisk indicates p-value<0.05, double asterisk p-value<0.01 and triple asterisk p-value<0.001). The promoter of GAPDH is used as a negative control. (D) ChIP-qPCR experiments for methylated H3K4. The PR binding sites shown in (B) are also enriched for this mark (p-value<0.05) of active enhancers and promoters. An intergenic region is used as a negative control. Error bars represent SE.
Components of small-GTPases signaling pathways are direct PR targets.
(A) T47D cells, treated with R5020 or EtOH for 3 hrs, were used for RNA extraction. RT-qPCR was performed using specific primers for two small GTPases (ARF6, RERG) and several GEFs (AKAP13, FGD4, NET1, PSD4, VAV3) and GAPs (ASAP2, ARHGAP17, ARHGAP42, RAP1GAP2). Expression levels were normalized to GAPDH. Error bars indicate the SEM. (p-value<0.005, single asterisk indicates p-value<0.5 and double asterisk p-value<0.05) (B) Visualization of the ChIP-seq data on the UCSC browser. PR binding sites in an intron of the NET1 gene and in distal intragenic regions of FGD4 and AKAP13 are depicted by the black blocks. (C) ChIP-qPCR experiments show increased PR binding in the three sites shown in (B) after 1 hr of progestin treatment (single asterisk indicates p-value<0.05, double asterisk p-value<0.01 and triple asterisk p-value<0.001). The promoter of GAPDH is used as a negative control. (D) ChIP-qPCR experiments for methylated H3K4. The PR binding sites shown in (B) are also enriched for this mark (p-value<0.05) of active enhancers and promoters. An intergenic region is used as a negative control. Error bars represent SE.The above data demonstrate that several small-GTPases, GAPs and GEFs are early (immediate) PR targets. Next, we asked whether these genes were, also, direct PR targets, where the receptor directly controlled their expression by binding to regulatory sequences. To this end we performed ChIP experiments with an antibody against the PR followed by paired-end NGS to accurately map the receptor's binding sites. In overall, we identified 11,180 PR binding sites in treated T47D cells. Validation of these data is shown in . FKBP5 is a well-known PR target and various PR binding sites have been reported in a distant intron of the human locus and in the first intron of the mouse one [25], [59]. Our ChIP-seq experiments identified a number of new PR binding sites several kilobases upstream of the FKBP5 transcript variant 1 (NM_004117) (), which is the PR-regulated transcript (). These sites were all verified by ChIP-qPCR (). We also assayed by ChIP-qPCR several other PR binding sites associated with PR-regulated genes and they were all found to be enriched for receptor binding ().Examination of our ChIP-seq dataset indicated that the majority of small GTPases, GAPs and GEFs (34 out of 59) were indeed associated with a PR-binding site within 50 Kb from the gene locus. Since the Rho GEFs are overrepresented among the PR targets listed in
, we selected three of them (NET1, FGD4 and AKAP13) for further analysis. Our ChIP-seq data revealed a PR binding site in an intron of the NET1 gene and in distal intergenic regions ∼50 Kb up-stream of the transcription start sites of FGD4 and AKAP13 (
). ChIP-qPCR experiments confirmed that these sites were enriched for PR binding after progestin treatment (
). To further show that receptor recruitment on these sites is not a random event, but it has a functional role, we performed ChIPs for H3K4me, a mark for promoters and enhancers [60]. We found that these sites were enriched for this histone modification (
) strongly arguing in favor of their regulatory role.Taken together the above data suggest that the PR can modulate the expression of a large number of small-GTPases and associated regulators. Several of them appear to be direct PR targets as their mRNA levels are affected shortly after progestin treatment and following PR binding in regulatory sites. Most PR targets are Ras and Rho small-GTPases regulators (GAPs and GEFs) (
). The Ras family members are involved in control of cell proliferation, while the Rho GTPases play an important role in cytoskeleton organization [23], and as such they are involved in cell adhesion, migration, proliferation, survival, differentiation and malignant transformation [61]. We confirmed that several Rho GEFs are under direct PR regulation (
). It is plausible that, in the breast cancer milieu, the intensified progestin stimulus induces over-expression of Rho GEFs leading to aberrant activation of the cognate small-GTPases, which, in this way, mediate hormonal control over important biological processes. In agreement with this hypothesis, the most common cause of aberrant small-GTPase signaling in cancer is the altered expression or activation of their regulators [61].Notably, it has been shown that non-genomic actions of the ligand-activated PR lead to activation of the RhoA/Rho-associated kinase (ROCK-2) cascade in T47D cells [52], [62]. This signaling pathway, ultimately, directs remodeling of the actin cytoskeleton and formation of membrane ruffles required for cell movement [52]; it, also, leads to rapid activation of the focal adhesion (FA) kinase and increased formation of FA complexes, which provide anchoring sites for cell attachment to the extracellular matrix during cell movement and invasion [62]. During preparation of this manuscript a study came out that described the PR-targetome during mammary gland branching morphogenesis [63]. Interestingly enough, the authors, also, identified components of "small-GTPases mediated signal transduction" to be highly enriched among the target genes [63]. They went on to show that progesterone activation of Rac (a Rho small-GTPase) signaling is necessary to induce side-brunching [63]. These findings are in agreement with our own data highlighting the small-GTPases signaling pathways as progestin-regulated networks in healthy and malignant mammary cells.
PR dictates alternative promoter use decisions in breast cancer cells
For the identification of differentially expressed transcripts (DETs) after progestin stimulation, we used Cuffdiff 2 [32] as described before. We set more stringent parameters for data analysis on the transcript level (≥1.5-fold change of expression and p-value≤0.05) leading to the identification of 1014 DETs (data not shown). To ensure the validity of our findings, we limited further analysis to the top 80 DETs (p-value≤5×10−5) (). Thirty of them were generated by genes that had multiple transcripts in the RefSeq database [64] and they are listed in . Annotation of the transcript variants (columns B and C in Suppl. file 3, sheet 2) was done according to the RefSeq database. The protein isoforms they encode are denoted as "canonical" according to the Uniprot database (column D). Information from both the RefSeq and the Uniprot databases was used in order to mark protein isoforms as "distinct" (different than the canonical) or "not distinct" (same as the canonical) (column D). Transcript variants generated by alternative splicing or alternate promoter usage relative to the canonical isoform are denoted as AS or AP respectively (column D). There is an equal number (12) of PR-regulated transcript variants that are generated either by AS or by AP, while 6 of them use both mechanisms (column F). Interestingly, the PR-regulated transcript variants of 26 out of the 29 genes (1 gene encodes for non-coding RNAs) encode for distinct protein isoforms (columns D and F), suggesting that progestin signaling may contribute significantly to the proteomic diversity of the cell.Eight (ARID5, GREB1, KANK1, NET1, PFKB3, RCAN1, TIPARP and TSC22D3) of these 30 genes had two or more transcript variants expressed in vehicle-treated T47D cells (data not shown), however, only one of these variants was differentially regulated after progestin treatment (). To confirm these results, we examined the 3 genes (NET1, KANK1 and TSC22D3) whose transcript variants appeared to be expressed in, approximately, similar levels in vehicle-treated T47D cells (
). We designed variant-specific primers and we performed a time-course RT-qPCR study; the results are shown in
. For all three genes, transcript variants 1 (herein called NET1.1, KANK1.1 and TSC22D3.1) retained a, relatively, stable level of expression, while transcript variants 2 (herein called NET1.2, KANK1.2 and TSC22D3.2) were strongly induced shortly after progestin treatment (
). This is in absolute agreement with the RNA-seq data (
). Given that an older study had found that both NET1 transcripts were up-regulated after progestin treatment [22], we used two different sets of primers for the NET1 transcript variant 1 (NET1.1) and we repeated the experiment in 3 biological replicates; both primer sets gave identical results in all three experiments (
).
Table 2
Expression levels (in FPKM) of transcript variants according to the RNA-seq data.
Official gene symbol
GenBank accession ID
R5020 (FPKM)
EtOH (FPKM)
log2(fold_change)
NET1
NET1 transcript variant 1
NM_001047160
10.0867
8.66451
−0.219266
NET1 transcript variant 2
NM_005863
118.522
5.59903
−4.40383
KANK1
KANK1 transcript variant 1
NM_153186
12.1742
1.73468
−2.81108
KANK1 transcript variant 2
NM_015158
1.87683
2.07454
0.14449
TSC22D3
TSC22D3 transcript variant 1
NM_198057
2.64031
2.02448
−0.383154
TSC22D3 transcript variant 2
NM_004089
38.2182
4.95535
−2.9472
Figure 5
PR regulation of specific transcript variants.
T47D cells were treated for 0 to 12-qPCR using transcript-specific primers. Expression levels were normalized to GAPDH. (A) Time course experiments for the NET1 transcripts show strong induction of the NET1.2 variant after progestin treatment, while the NET1.1 variant retains a stable expression pattern. To confirm the NET1.1 levels of expression, two different set of transcript-specific primers were used (in green and red) giving identical results. (B) Time course experiments for the two KANK1 transcripts confirm upregulation of the KANK1.2 after progestin treatment. (C) Time course experiments for the two TSC22D3 transcript variants show that only the TSC22D3.2 variant is strongly upregulated after progestin treatment. (D) ChIP-qPCR experiments show significant increase in polII binding on the promoters of NET1.2, KANK1.2 and TSC22D3.2 after 1 hr of progestin treatment (ethanol-treated control is set to 1) (p-value<0.05). Error bars represent SE except in (A), where they represent the SEM.
PR regulation of specific transcript variants.
T47D cells were treated for 0 to 12-qPCR using transcript-specific primers. Expression levels were normalized to GAPDH. (A) Time course experiments for the NET1 transcripts show strong induction of the NET1.2 variant after progestin treatment, while the NET1.1 variant retains a stable expression pattern. To confirm the NET1.1 levels of expression, two different set of transcript-specific primers were used (in green and red) giving identical results. (B) Time course experiments for the two KANK1 transcripts confirm upregulation of the KANK1.2 after progestin treatment. (C) Time course experiments for the two TSC22D3 transcript variants show that only the TSC22D3.2 variant is strongly upregulated after progestin treatment. (D) ChIP-qPCR experiments show significant increase in polII binding on the promoters of NET1.2, KANK1.2 and TSC22D3.2 after 1 hr of progestin treatment (ethanol-treated control is set to 1) (p-value<0.05). Error bars represent SE except in (A), where they represent the SEM.We further investigated the mechanisms that mediated this preferential transcript regulation. In all three genes, the alternative transcripts were generated by alternative promoters. In the case of KANK1 and NET1, our ChIP-seq experiments had revealed PR-binding sites in the first intron of KANK1.2 (not shown) and in the promoter of NET1.2 (
); these sites were confirmed by ChIP-qPCR experiments (Figures S2C and 4C respectively). For the TSC22D3 gene we did not find any PR-binding sites within a 50Kb distance. It is possible that our ChIP-seq experiments failed to capture such a site. It is also possible that the PR regulatory site is located in a greater distance or that the induction of TSC22D3.2 is not a direct transcriptional effect of the receptor. Since TSC22D3 is a known glucocorticoid-regulated gene with well-characterized GREs [65], we designed primers encompassing the GR-binding site and performed ChIP-qPCR for the PR. We did not find any enrichment for PR-binding in this region (data not shown). Subsequently, we performed polII ChIP-qPCR experiments and we showed that polII recruitment on the NET1.2, KANK1.2 and TSC22D3.2 promoters was increased in response to progestin, but it was not, significantly, affected on the promoters of the non-progestin regulated variants (
). Assumingly, PR recruits transcription co-factors that facilitate polII loading onto the regulated promoters. Communication of the PR complex with the non-regulated promoters may be prevented through binding of the insulator factor CTCF, as it has been suggested for the differential regulation of transcript variants by the estrogen receptor [22].These findings are of particular importance, since, oftentimes, transcript variants of the same gene encode for different protein isoforms with diverse functions. A striking example is provided by the NET1 gene itself. NET1 is a RhoA specific GEF. RhoA is aberrantly expressed in many humancancers, including breast cancer, and its activation is essential for cancer cell migration and invasion [66]. The two transcript variants generated by the NET1 gene (
B), a long one known as NET1 (NET1.1, NM_001047160) and a shorter one also known as NET1A (NET1.2, NM_005863), encode for isoforms with potentially different functions [22]. Studies have indicated that the NET1 isoform is important for cell proliferation [22], while the NET1A controls cytoskeleton reorganization and cell motility [22], [66], [67]. Specifically, NET1A is required for FAK activation, focal adhesion maturation and it is necessary for amoeboid extracellular matrix invasion of breast cancer cells [66]. It is, therefore, possible that NET1A plays an important role in metastatic breast cancer. According to our data PR induces over-expression specifically of the NET1A (NET1.2) variant. It is tempting to speculate that increased levels of NET1A may lead to aberrant activation of RhoA and mediate breast cancer metastasis.The transcript variants generated by KANK1 and TSC22D3, also, encode for distinct protein isoforms () with, potentially, different functions. KANK1 was first identified as a candidate tumor suppressor gene for renal cell carcinoma and it has several alternative promoters, by which different types of transcripts are generated from the same locus (reviewed in [68]). It encodes an ankyrin-repeat domain-containing protein, which, negatively, regulates the formation of actin stress fibers and cell migration through inhibition of RhoA, while it may also have a growth inhibitory effect [68]. Two types of KANK1 proteins have been reported with distinct tissue distribution [69], but they have not been functionally characterized. TSC22D3 (also known as GILZ) is a well-studied glucocorticoid-regulated gene in the immune system. Four different isoforms have been identified and characterized; these are not functionally redundant, but rather, they are involved in distinct aspects of cellular physiology and modulate distinct signaling pathways [70]. The PR-regulated variant (NM_004089) encodes for isoform GILZ1 [70]. GILZ1 appears to be the most potent isoform in stimulation of Na+ transport and repression of NF-κB. Few studies have addressed the role of GILZ in cancer. A recent study showed that it was expressed in epithelial ovarian cancer, where it increased tumor cell proliferation, activated AKT and regulated p21 and cyclin D1 expression, events that are associated with tumor progression [71].Taken together, the above data strongly suggest that the PR can dictate promoter use decisions leading to significant expression changes of specific transcript variants. Future studies should aim to characterize these variants, examine their potential clinical value in hormone responsive breast tumors and, more importantly, determine the functional properties of the isoforms they encode.
Conclusions
The era of next-generation sequencing has immensely advanced our understanding in nuclear hormone receptor signaling [72]. Several studies were particularly aimed to the estrogen receptor and they offered significant new insights into the complex regulatory gene networks controlled by the receptor in breast cancer cells [73]. However, this powerful technique has not been as extensively used in the study of the PR. A few recent papers have performed genome-wide mapping of PR binding sites in breast cancer cells [74], [75], [76] confirming that the receptor binds more frequently in intra- and inter- genic regions than in the promoters of target genes in agreement with earlier observations [25], [59]. To complement these studies and shed more light into the progestin-regulated gene networks in breast cancer cells, we employed 50bp paired-end sequencing to identify early responsive transcripts. Genes that display expression changes at early time points are more likely to be primary PR targets; however, these changes are usually of small magnitude and are often missed by microarray studies. Our experimental approach provided the necessary depth to detect such changes. We identified 1287 DEGs and we extensively validated new targets by RT-qPCR. Our data offer a new insight into the multifaceted role of PR in breast cancer biology and point to new routes future research can take. For example, we find that the PR alters the expression levels of key transcription factors and, in this manner, it may be affecting important transcriptional networks that govern cell fate. It remains to be seen whether these changes accommodate the needs of the cancer cell and corroborate the role of PR in promoting tumorigenesis. Of particular importance is the finding that the PR regulates a plethora of genes that participate in small-GTPase signaling cascades. Integration of the transcriptome and PR-cistrome profiling of hormonally treated cells strongly suggests that several small-GTPases regulators are direct PR targets. It is likely that progestin modulation of their expression levels leads to deregulation of the respective small-GTPases and the processes they control and eventually contributes to mammary tumorigenesis. Finally, our data reveal that the PR regulates the expression of specific transcript variants, and it, most likely, contributes to a more complex proteomic profile of the breast cancer cell. Future studies will show whether these specific PR-regulated transcripts may have clinical utility in prognosis and/or the development of targeted therapies.PR-regulated genes involved in cell death/apoptosis. GO annotation and literature search led to the functional categorization of genes involved in cell death as pro-apoptotic or anti-apoptotic. A few genes were denoted as both pro- and anti- apoptotic and were counted in both groups.(TIF)Click here for additional data file.ChIP-sequencing experiments for the PR identify receptor binding sites in T47D cells. (A) Cells treated with R5020 for 1 hr were used for ChIP experiments with an antibody against the PR. Immunoprecipitated DNA was used in sequencing experiments and representative data are shown here. Four PR binding sites (depicted by black blocks) were found in distal enhancer elements of FKBP5 transcript variant 1 (NM_004117), which is the PR-regulated transcript. (B) ChIP-qPCR experiments for the PR were performed in progestin- and vehicle- treated cells. The primers used amplified part of some of the PR binding sites (depicted by red arrows and labeled a-e) identified in the FKBP5 locus by ChIP-seq. Error bars indicate the SEM. (single asterisk indicates p-value<0.05, double asterisk p-value<0.005 and triple asterisk p-value<0.005). (C) As in (B), but primers used amplified part of the PR binding sites associated with the genes shown. A known PR binding site in the promoter of CDKN1A is used as a positive control and the promoter of GAPDH as a negative one (p-value<0.001, single asterisk indicates p-value<0.05, double asterisk p-value<0.01 and triple asterisk p-value<0.005).(TIF)Click here for additional data file.Differentially expressed genes in T47D cells after 3 hours of progestin treatment. The log2(fold change) of the ratio of EtOH-treated to R5020-treated is given. Genes identified by previous gene expression microarrays studies are accompanied by reference numbers.(XLSX)Click here for additional data file.Gene Ontology analysis of progestin-regulated genes.(XLSX)Click here for additional data file.Differentially expressed transcripts (DETs) in T47D cells after 3 hours of progestin treatment. The top 80 DETs (p-value≤5×10-5) are listed in table A_DETs. Thirty of them are generated by genes that have multiple transcripts in the RefSeq database [64] and they are listed in table B_annotated transcripts.(XLSX)Click here for additional data file.
Authors: Sumie Kato; Mauricio Pinto; Andrés Carvajal; Natalia Espinoza; Carolina Monso; Anil Sadarangani; Manuel Villalon; Jan J Brosens; John O White; Jennifer K Richer; Kathryn B Horwitz; Gareth I Owen Journal: J Clin Endocrinol Metab Date: 2004-11-23 Impact factor: 5.958
Authors: Karen O Yee; Caitlin M Connolly; Mark Duquette; Shideh Kazerounian; Raymond Washington; Jack Lawler Journal: Breast Cancer Res Treat Date: 2008-04-13 Impact factor: 4.872
Authors: Gerardo Heiss; Robert Wallace; Garnet L Anderson; Aaron Aragaki; Shirley A A Beresford; Robert Brzyski; Rowan T Chlebowski; Margery Gass; Andrea LaCroix; JoAnn E Manson; Ross L Prentice; Jacques Rossouw; Marcia L Stefanick Journal: JAMA Date: 2008-03-05 Impact factor: 56.272
Authors: Todd P Knutson; Andrea R Daniel; Danhua Fan; Kevin At Silverstein; Kyle R Covington; Suzanne Aw Fuqua; Carol A Lange Journal: Breast Cancer Res Date: 2012-06-14 Impact factor: 6.466
Authors: I Garcia-Murillas; R Sharpe; A Pearson; J Campbell; R Natrajan; A Ashworth; N C Turner Journal: Oncogene Date: 2013-06-10 Impact factor: 9.867
Authors: Megan R Schwarzman; Janet M Ackerman; Shanaz H Dairkee; Suzanne E Fenton; Dale Johnson; Kathleen M Navarro; Gwendolyn Osborne; Ruthann A Rudel; Gina M Solomon; Lauren Zeise; Sarah Janssen Journal: Environ Health Perspect Date: 2015-06-02 Impact factor: 9.031
Authors: Maria F Ogara; Santiago A Rodríguez-Seguí; Melisa Marini; Ana Silvina Nacht; Martin Stortz; Valeria Levi; Diego M Presman; Guillermo P Vicent; Adali Pecci Journal: Nucleic Acids Res Date: 2019-11-18 Impact factor: 16.971