Literature DB >> 23648477

Identification and validation of a new set of five genes for prediction of risk in early breast cancer.

Giorgio Mustacchi1, Maria Pia Sormani, Paolo Bruzzi, Alessandra Gennari, Fabrizio Zanconati, Daniela Bonifacio, Adriana Monzoni, Luca Morandi.   

Abstract

Molecular tests predicting the outcome of breast cancer patients based on gene expression levels can be used to assist in making treatment decisions after consideration of conventional markers. In this study we identified a subset of 20 mRNA differentially regulated in breast cancer analyzing several publicly available array gene expression data using R/Bioconductor package. Using RTqPCR we evaluate 261 consecutive invasive breast cancer cases not selected for age, adjuvant treatment, nodal and estrogen receptor status from paraffin embedded sections. The biological samples dataset was split into a training (137 cases) and a validation set (124 cases). The gene signature was developed on the training set and a multivariate stepwise Cox analysis selected five genes independently associated with DFS: FGF18 (HR = 1.13, p = 0.05), BCL2 (HR = 0.57, p = 0.001), PRC1 (HR = 1.51, p = 0.001), MMP9 (HR = 1.11, p = 0.08), SERF1a (HR = 0.83, p = 0.007). These five genes were combined into a linear score (signature) weighted according to the coefficients of the Cox model, as: 0.125FGF18 - 0.560BCL2 + 0.409PRC1 + 0.104MMP9 - 0.188SERF1A (HR = 2.7, 95% CI = 1.9-4.0, p < 0.001). The signature was then evaluated on the validation set assessing the discrimination ability by a Kaplan Meier analysis, using the same cut offs classifying patients at low, intermediate or high risk of disease relapse as defined on the training set (p < 0.001). Our signature, after a further clinical validation, could be proposed as prognostic signature for disease free survival in breast cancer patients where the indication for adjuvant chemotherapy added to endocrine treatment is uncertain.

Entities:  

Mesh:

Year:  2013        PMID: 23648477      PMCID: PMC3676806          DOI: 10.3390/ijms14059686

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   5.923


1. Introduction

In the last few years, several multi-gene assays performed on tumor tissue from women with early breast cancer have been proposed to provide prognostic information and discriminate good vs. poor prognosis [1-15]. These assays might be useful to assist in making more informed treatment decisions regarding chemotherapy, according to the main international guidelines [16,17]. The array gene expression analysis “Mammaprint®” identifies a 70 gene-signature indicative for poor prognosis in patients with lymph node-negative disease or with 1–3 positive nodes, predicting chemotherapy benefit in the “high risk” group, vs. no apparent benefit in the “low risk” group [3-6], in a non-randomized clinical setting. It needs fresh/frozen tissue of the primary breast tumors [2,3]. The multigene assay “Oncotype DX®” evaluate gene expression analysis of 21 genes starting from paraffin-embedded tissue calculating a recurrence score to classify patients at low, intermediate, or high risk for recurrence. From two independent retrospective analyses from phase III clinical trial with adjuvant tamoxifen-alone control arms, the 21-gene recurrent score (RS) assay defines a group of patients with low scores who do not appear to benefit from chemotherapy, and a second group with very high scores who derive major benefit from chemotherapy, independently of age and tumor size [1,9-11]. Other studies using a supervised approach based on clinical outcome endpoint to tumor grade as a basis for gene findings have resulted in development of multiple commercial reference lab assays for prognostication (MapQuant Dx [14], Theros Breast Cancer Index [15]). The above-mentioned multigene assays are expensive and validations have been made on patients selected by age and nodal or Estrogen Receptor status and or received adjuvant treatment. Analyzing data from several array based gene expression wide analysis publicly available on NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/), we identified a subset of 20 mRNA differentially regulated in breast cancer. We activated a protocol evaluating these markers to create a new gene signature based on real time PCR from paraffin embedded tissue and on a “real life” breast cancer patient population. The enrolled cases were not selected for age, adjuvant treatment, nodal and estrogen receptor status.

2. Results and Discussion

Formalin-fixed and paraffin-embedded (FFPE) tissues represent one of the largest tissue sources, for which well-documented clinical follow-up is available, and therefore large-scale retrospective studies are possible [18]. As described recently by Bussolati et al. [19], in a near future the possibility of obtaining high-quality total RNA from archival tissues will guarantee a more powerful and robust gene expression analysis. In order to identify a small number of informative genes providing prognostic information for breast cancer, we evaluated in silico a set of published signatures and tested by gene expression array on the 408 breast cancer cases deposited in NCBI Gene Expression Omnibus. By several steps involving univariate analysis for the association with disease free survival (DFS), unsupervised hierarchical clustering algorithm, and multivariate Cox modelling selection, we found 20 highly related genes with DFS. These candidate genes were subsequently evaluated in vitro by RTqPCR analyzing a total of 261 cases representing the training (137 cases) and the validation (124 cases) datasets (see the workflow shown in Figure 1).
Figure 1

Construction of the gene-set predictor/gene signature for risk prediction. (A) Gene selection on the published datasets; (B) Gene selection on the merged Gene Expression Omnibus (GEO) datasets; (C) Developing the gene signature.

2.1. Gene Selection on the Published Datasets

We used data deposited in NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/, GEO Series accession number GSE1456 and GSE3494), including 408 breast cancer cases. Files containing raw intensity data of Affymetrix HU133A and HU133B arrays of the two datasets (GSE1456 and GSE3494) were preprocessed using R/Bioconductor (GCRMA package, quantile normalization, median polish summarization). The two data sets were pre-processed together using the supercomputer Michelangelo (http://www.litbio.org). The candidate genes were selected from the above mentioned datasets as those included in 4 previously proposed signatures: the “70-gene signature” developed by van de Vijver et al. [3] and van’t Veer et al. [2] including 70 genes, the “recurrence-score” developed by Paik et al. [9] including 21 genes, the “two-gene-ratio model” [12] including 2 genes and the “Insulin Resistance” signature including 15 genes [20] (Table 1). Since some genes are present in more than one signature, the final extracted set was made up of 98 genes (194 Affy-probes) (Table 1).
Table 1

Genes selected and also present in other previously published signatures (1 = van’t Veer et al. [2], 2 = Paik et al. [9], 3 = Gennari et al. [20], 4 = 2 Ma et al. [12], 1.5 = van’t Veer et al. [2] with Paik et al. [9]).

SymbolAffyIDGroupAffychipSymbolAffyIDGroupAffychipSymbolAffyIDGroupAffychip
ALDH4A1203722_at1.00AMKI67212021_s_at2.00AMCM6201930_at1.00A
AP2B1200612_s_at1.00AMKI67212022_s_at2.00AMELK204825_at1.00A
AP2B1200615_s_at1.00AMKI67212023_s_at2.00AMKI67212020_s_at2.00A
AURKA204092_s_at2.00AMMP11203876_s_at2.00ASLC2A3240055_at1.00
AURKA208079_s_at2.00AMMP11203878_s_at2.00AZNF533229019_at1.00
AURKA208080_at2.00AMMP9203936_s_at1.00AZNF533243929_at1.00
AYTL2201818_at1.00AMYBL2201710_at2.00AIGF1209540_at3.00A
BAG1202387_at2.00ANDC80204162_at1.00AIGF1R203628_at3.00A
BAG1211475_s_at2.00ANUSAP1218039_at1.00AIGF2202410_x_at3.00A
BBC3211692_s_at1.00AORC6L219105_x_at1.00AIGFBP4201508_at3.00A
BC045642212248_at1.00AOXCT1202780_at1.00AIGFBP5203424_s_at1.00A
BC045642212250_at1.00APALM2-AKAP2202759_s_at1.00AIGFBP5203425_s_at1.00A
BC045642212251_at1.00APALM2-AKAP2202760_s_at1.00AIGFBP5203426_s_at1.00A
BCL2203684_s_at2.00APECI218025_s_at1.00AIGFBP5211958_at1.00A
BCL2203685_at2.00APGR208305_at2.00AIGFBP5211959_at1.00A
BCL2207004_at2.00APITRM1205273_s_at1.00AIGFBP6203851_at3.00A
BCL2207005_s_at2.00APQLC2220453_at1.00AIGFBP7201163_s_at3.00A
BF034907206023_at1.00APRC1218009_s_at1.00AIL17RB219255_x_at4.00A
BIRC5202094_at2.00ARAB6A201045_s_at1.00AIL6ST204863_s_at3.00A
BIRC5202095_s_at2.00ARAB6A201047_x_at1.00AINSIG1201627_s_at3.00A
BIRC5210334_x_at2.00ARAB6A201048_x_at1.00AIRS1204686_at3.00A
C16orf61218447_at1.00ARAB6A210406_s_at1.00AIRS2209184_s_at3.00A
C20orf46219958_at1.00ARFC4204023_at1.00ALGP2219364_at1.00A
C9orf30205122_at1.00ASCUBE2219197_s_at1.50ALOC643008229740_at1.00B
C9orf30205123_s_at1.00ASERF1A219982_s_at1.00AMCM6238977_at1.00B
CCNB1214710_s_at2.00ASLC2A3202497_x_at1.00AMS4A7223343_at1.00B
CCNE2205034_at1.00ASLC2A3202498_s_at1.00AMS4A7223344_s_at1.00B
CCNE2211814_s_at1.00ASLC2A3202499_s_at1.00AMS4A7224358_s_at1.00B
CD68203507_at2.00ASLC2A3216236_s_at1.00APALM2-AKAP2226694_at1.00B
CDC42BPA214464_at1.00ASLC2A3222088_s_at1.00AQSOX2227146_at1.00B
CENPA204962_s_at1.00ASTK32B219686_at1.00AQSOX2235239_at1.00B
CENPA210821_x_at1.00ATGFB3209747_at1.00ARTN4RL1229097_at1.00B
COL4A2211964_at1.00ATNFRSF10B209295_at3.00ARTN4RL1232596_at1.00B
COL4A2211966_at1.00ATNFRSF12A218368_s_at3.00ARTN4RL1242102_at1.00B
CTSL2210074_at2.00ATNFRSF21214581_x_at3.00ARUNDC1226298_at1.00B
DCK203302_at1.00ATNFSF10214329_x_at3.00ARUNDC1235040_at1.00B
DIAPH3220997_s_at1.00ATSPYL5213122_at1.00ASERF1A223538_at1.00B
DTL218585_s_at1.00AUCHL5219960_s_at1.00ASERF1A223539_s_at1.00B
ECT2219787_s_at1.00AWISP1206796_at1.00ASLC2A3236180_at1.00B
EGLN1221497_x_at1.00AWISP1211312_s_at1.00ASLC2A3236571_at1.00B
ESM1208394_x_at1.00AAA834945230365_at1.00BGRB7210761_s_at2.00A
ESR1205225_at2.00AAA834945235039_x_at1.00BGSTM1204418_x_at2.00A
ESR1207672_at2.00AAI224578235247_at1.00BGSTM1204550_x_at2.00A
ESR1211233_x_at2.00AAI283268232579_at1.00BGSTM1215333_x_at2.00A
ESR1211234_x_at2.00AAP2B1234064_at1.00BGSTM3202554_s_at1.00A
ESR1211235_s_at2.00AAW014921230710_at1.00BHER2210930_s_at2.00A
ESR1211627_x_at2.00AAW014921236480_at1.00BHER2216836_s_at2.00A
ESR1215552_s_at2.00AAYTL2241511_at1.00BHOXB13209844_at4.00A
ESR1217163_at2.00ACDCA7224428_s_at1.00BHRASLS219983_at1.00A
ESR1217190_x_at2.00ACDCA7230060_at1.00BHRASLS219984_s_at1.00A
EXT1201995_at1.00ACOL4A2237624_at1.00BIDE203328_x_at3.00A
EXT1215206_at1.00ADCK224115_at1.00BFBXO31223745_at1.00B
FBXO31219784_at1.00ADTL222680_s_at1.00BFBXO31224162_s_at1.00B
FBXO31219785_s_at1.00AEBF4233032_x_at1.00BFBXO31236873_at1.00B
FBXO31222352_at1.00AEBF4233850_s_at1.00BFGF18231382_at1.00B
FGF18206986_at1.00AECT2234992_x_at1.00BFLT1226497_s_at1.00B
FGF18206987_x_at1.00AECT2237241_at1.00BFLT1226498_at1.00B
FGF18211029_x_at1.00AEGLN1223045_at1.00BFLT1232809_s_at1.00B
FGF18211485_s_at1.00AEGLN1223046_at1.00BGPR180231871_at1.00B
FGF18214284_s_at1.00AEGLN1224314_s_at1.00BGPR180232912_at1.00B
FLT1204406_at1.00AEXT1232174_at1.00BGSTM3235867_at1.00B
FLT1210287_s_at1.00AEXT1234634_at1.00BLOC286052241370_at1.00B
FLT1222033_s_at1.00AEXT1237310_at1.00B
GMPS214431_at1.00AEXT1239227_at1.00B
GNAZ204993_at1.00AEXT1239414_at1.00B
GPR126213094_at1.00AEXT1242126_at1.00B

2.2. Gene Selection on the Merged GEO Datasets

The 98 genes selected from the published signatures were first tested in univariate analysis for their association with disease free survival (DFS). Forty-eight genes resulted associated with DFS with a p value < 0.01 and were selected for the subsequent step. Using an unsupervised hierarchical clustering algorithm, 20 clusters were selected grouping genes with similar expression profiles. A gene was selected within each cluster using a multivariate Cox model, choosing the one most associated with DFS: the final 20-genes set, all highly associated with DFS, are reported in Table 2.
Table 2

Final 20 genes set, all highly associated with Disease free survival (DFS).

IndexSymbolClusterAffyIDGroupChiplogHRHRp value
114PRC11218009_s_at1A0.261.29<0.00001
120ORC6L16219105_x_at1A0.361.440.000201
38MMP914203936_s_at1A0.141.150.000607
11AYTL25201818_at1A0.381.460.000828
69TGFB33209747_at1A−0.230.790.000860
145SERF1A19223539_s_at1B0.361.440.001192
163FGF188231382_at1B−0.410.670.003375
156QSOX218227146_at1B0.511.660.003409
143MS4A715223344_s_at1B−0.160.850.004351
126FBXO317219785_s_at1A0.311.360.004459
164GPR1809231871_at1B0.331.390.005603
54PITRM117205273_s_at1A0.261.300.007143
33BCL26203685_at2A−0.160.850.003310
68IGF12209540_at3A−0.220.800.000001
35IGFBP62203851_at3A−0.400,670.000002
47IL6ST12204863_s_at3A−0.190.830.000028
45IRS113204686_at3A−0.190.820.001258
7IGFBP74201163_s_at3A−0.410.660.001529
102TNFSF1020214329_x_at3A−0.200.820.004448
26IDE11203328_x_at3A0.521.680.005188

2.3. Tumor Samples

Among 350 consecutive invasive breast cancer patients with full information about tumor, adjuvant treatments, follow up, relapse, death and causes of death, treated between 1998 and 2001, 89 cases (25.4%) were removed from the study because of the low RNA concentration (below 10 ng/μL) or high degradation (Ct values for ACTB and B2M over 34). The remaining 261 cases were split in two biological sample datasets: The training (137 cases) and the validation set (124 cases) by a simple criteria of consecutiveness. The clinical and demographic characteristics of the patients included in the training and in the validation set are summarized in Table 3 and reported in detail in the supplementary file. Due to a simple criteria of consecutiveness building the sets, the Training set has a longer mean follow up (100.7 months; range 59–123) as compared with the Validation set (89.2; 61–121). Nevertheless, the only significant differences between the two sets was the use of anthracycline-based regimens in the adjuvant setting (Training 16% vs. Validation 32.2%; p = 0.01) and an higher incidence of G3 tumors in the Validation Set (30.6% vs. 19.7, p = 0.04). The lack of information about HER2 Status is related to the temporal context of the selected cases (1998–2001) and it was evaluated “a posteriori” just in 40% of relapsed patients. Any other clinical and biological pattern is similar and reflecting the “real life” picture of the disease in North East of Italy at this time.
Table 3

Characteristics of patients and tumors in the Training and Validation sets.

Training SetValidation Setp value
Nr of Patients137124ns
Mean Age (range)62.3 (35–87)61.1 (33–87)ns
Mean Follow up (months)100.7 (59–123)89.2 (61–121)ns
Histologyn%n%p value
Ductal8662.88366.9ns
Lobular26191612.9ns
Tubular-Lobular128.8108.5ns
Medullary/Apocrine21.432.4ns
Other118.02129.6ns

T Size
T17856.98266.1ns
T25338.73729.8ns
T332.232.4ns
Tx32.221.6ns

N Status
pN089657560.5ns
pN1a26192621ns
pN+ 4–10118.175.6ns
pN+ >10107.31411.3ns
NX0
ER/PgR pos12385.49776.38ns
HER2 NA12591.27973.7p = 0.05*

Grading
G13324.12016.1ns
G25137.25746ns
G32719.73830.6p = 0.04
G NA261997.3ns

Ki67
High (>14%)6043.86048.4Ns
Low (<15%)7756.26048.4ns
Adjuvant Chemo4935.85746ns
Anthracycline-based22164032.2p = 0.01
Adjuvant endocrine (any)11080.39677.4p = 0.01
Relapses33243830.6ns
Mean DFS, months51.447.2ns
Deaths33243931.4ns

In the Validation Set HER2 status was evaluated in relapsed patients.

2.4. Signature Definition on the Training Set

A multivariate stepwise Cox analysis was run on the breast cancer samples including the 20 selected genes. The Cox model selected a final set of five genes independently associated with DFS (Table 4): FGF18 (HR = 1.13, p = 0.05), BCL2 (HR = 0.57, p = 0.001), PRC1 (HR = 1.51, p = 0.001), MMP9 (HR = 1.11, p = 0.08), SERF1a (HR = 0.83, p = 0.007).
Table 4

Genes selected in the five-genes signature. Variables in the Equation.

95.0% CI for Exp(B)

geneBSEWalddfSig.Exp(B)LowerUpper
FGF180.1250.0643.73610.0531.1330.9981.285
BCL2−0.560.17310.444410.0010.5710.4070.802
PRC10.4090.1211.71210.0011.5061.1911.903
MMP90.1040.063.03110.0821.1090.9871.247
SERF1A−0.1880.0697.37510.0070.8280.7230.949
These five genes were combined into a linear score (signature) weighted according to the coefficients of the Cox model (Table 4), as: This score ranged from −2.95 to 2.91, with a mean value of −0.48 a SD of 1.00. The linear score was highly associated with DFS in the training set: HR = 2.7, 95% CI = 1.9–4.0, p < 0.001. The score was then categorized in three groups according to the tertiles of its distribution. The DFS according to the three risk groups is reported in Figure 2: Patients with an intermediate risk signature had an HR = 6.03, (95% CI = 1.35–27.0, p = 0.019) and patients with a high risk signature had an HR = 10.8, (95% CI = 2.51–46.64, p = 0.001) as compared to patients with a low risk signature.
Figure 2

Training set: Probability of 5 years relapse: Disease free survival (DFS) according to the risk groups defined by the gene signature in the training set: Low risk group (blue curve), intermediate risk group (green curve), high risk group (red curve). The hazard ratio (HR) of DFS for intermediate risk patients as compared to low risk is 6.0 (95% Confidence Intervals (CI) = 1.35–27.0, p = 0.019 and the HR of DFS for high risk patients as compared to low risk is 10.8 (95% CI = 2.51–46.6, p = 0.001).

2.5. Signature Evaluation on the Validation Set

The signature defined on the training set was evaluated on the independent set of data of the 124 patients included in the validation set. The discrimination ability of the signature was assessed on the validation set by a Kaplan Meier analysis, using the same cut offs classifying patients at low, intermediate or high risk of disease relapse as defined on the training set. The score resulted highly associated with DFS also in the validation set (p < 0.001) (Figure 3). Patients with an “intermediate risk” signature had an HR = 2.1 (95% CI = 0.72–6.2, p = 0.17) and patients with a high risk signature had an HR = 5.4 (95% CI = 2.0–14.4, p = 0.001) as compared to patients with a low risk signature.
Figure 3

Validation set: Probability of 5 years relapse. Disease free survival (DFS) according to the risk groups defined by the gene signature in the validation set: low risk group (blue curve), intermediate risk group (green curve), high risk group (red curve). The hazard ratio (HR) of DFS for intermediate risk patients as compared to low risk is 2.1 (95% Confidence Intervals (CI) = 0.72–6.2, p = 0.17) and the HR of DFS for high risk patients as compared to low risk is 5.4 (95% CI = 2.0–14.4, p = 0.001).

2.6. Inter and Intra Assay Reproducibility

Three serial sections from three cases each were evaluated independently in triplicate calculating the coefficients of variation (CVs) for the Recurrent Score in the same run and in different runs. The intra-assay and the inter-assay CVs was 3.7% and 4.7%, respectively.

2.7. Univariate Analysis

In the Univariate Analysis variables significantly related to DFS were Nodal Status (p = 0.0000001), T Size (p = 0.000002), the five gene Signature (i = 0.000043), Ki67 (p = 0.0007) and Grading (p = 0.027) (Table 5).
Table 5

Univariate analysis.

VariableRegression coefficient (B)SEExp (B)MeanZ-valueProbability level
Nodal Status (pN0/pN1a/pN2)0.5910.1001.8060.0625.10.0000001
T Size (pT1/pT2/pT3)3.6477.6391.03720.1954.770.000002
5 gene Signature (High/Intermediate/Low)0.6460.1581.9091.9844.090.000043
Ki67 (High/Low)0.4270.1261.5331.9333.380.0007
Grading (G1/G2/G3)0.2980.1351.3481.7982.20.027

2.8. Multivariate Analysis

The Multivariate Analysis (Cox Regression) indicates that Nodal Status (p = 0.00001), T Size (p = 0.0002) and the five-gene Signature (p = 0.0004) are significantly related to DFS, while Ki67 (cut off: 14%), Grading and Chemo- or Endocrine Adjuvant Treatments are not (Table 6). The five-gene Signature HR is slightly affected by adjuvant treatments: Table 7 summarized data about the five-gene signature in presence or absence of Adjuvant treatment.
Table 6

Multivariate Cox regression analysis.

VariableRegression coefficient (B) (95% CI)SEExp (B)MeanZ-valueProbability level
Nodal Status (pN0/pN1a/pN2)0.551 (0.350–0.752)0.1021.7360.6555.3790.00001
T Size (pT1/pT2/pT3)0.562 (0.269–0.854)0.1491.7541.4493.7620.0002
5 gene Signature (High/Intermediate/Low)0.666 (0.298–1.034)0.1871.9471.97673.5490.0004
Ki67 (High/Low)0.27 (−0.028–0.569)0.1521.311.7481.770.076
Grading (G1/G2/G3)−0.111 (−0.387–0.164)0.140.8941.798−0.7920.428
AdjChemo (Yes/No)0.061 (−0.479–0.601)0.2751.0631.6040.2210.824
Adj Endocrine (Yes/No)0.032 (−0.556–0.622)0.31.0331.2090.1090.912
Table 7

Hazard Ratio Longrank (Cox-Mantel) for five-gene signature in presence or absence of adjuvant treatments.

Chemo or endocrine adjuvant treatment

YESNO
5 Gene ScoreHR95% CIp valueHR95% CIp value
Low vs. High0.350.20–0.600.00060.160.08–0.320.0001
Low vs. Intermediate0.980.45–2.110.90.290.11–0.770.0224
Intermediate vs. High0.40.23–0.690.0020.560.29–1.060.089

2.9. Discussion

In this study we developed a five-gene recurrence score able to estimate the likelihood of recurrence in a series of consecutive breast cancer tissue samples. These five informative genes were selected by a multistep approach summarized in Figure 1. Firstly, we identified in silico a subset of 20 mRNA differentially regulated in breast cancer analyzing several publicly available array gene expression data using R/Bioconductor package. We further evaluated, in vitro, the expression level of these 20 genes in 261 consecutive invasive breast cancer cases not selected for age, adjuvant treatment, nodal and estrogen receptor status from paraffin embedded sections. The only requested feature was a minimum follow up of 5 years with full clinical data. Each tissue block was reviewed by a pathologist to ensure greater than 70% content of tumor cells. The gene expression analysis was based on RTqPCR. The biological samples dataset was split into a training and a validation dataset. The gene signature was developed on the training set by a multivariate stepwise Cox analysis selecting five genes independently associated with DFS. These five genes were combined into a linear score (signature) weighted according to the coefficients of the Cox model. The signature was then evaluated on the validation set assessing the discrimination ability by a Kaplan Meier analysis, using the same cut offs classifying patients at low, intermediate or high risk of disease relapse as defined on the training set. These five genes of interest were identified without any a priori selection for gene function or cancer involvement, but simply for the relationship between their expression level and DFS. Interestingly, except for SERF1a which the function is still unknown, they have been described to play an important role in cancer as follows: FGF18: Its over-expression in tumors has also been demonstrated [21,22]. FGF18 expression is up-regulated through the constitutive activation of the Wnt pathway observed in most colorectal carcinomas [23]. As a secreted protein, FGF18 can thus affect both the tumor and the connective tissue cells of the tumor microenvironment. BCL2: Over-expression of BCL2 protein has been identified in a variety of solid organ malignancies, including breast cancer. BCL2 transcript over-expression is related to unfavorable prognosis in Oncotype Dx [9] and in Mammaprint® [3]. PRC1: It associates with the mitotic spindle and has been found to play a crucial role in the completion of cytokinesis [24,25]. PRC1 is negatively regulated by p53 and it is over-expressed in p53 defective cells [26] suggesting that the gene is tightly regulated in a cancer-specific manner. MMP9: Metalloproteases are frequently up-regulated in the tumor microenvironment [27]. MMP9 influence many aspects of tissue function by cleaving a diverse range of extracellular matrix, cell adhesion, and cell surface receptors, and regulate the bioavailability of many growth factors and chemokines [28]. SERF1a: The function of SERF1a is not already known. The biological properties of these genes are related with four of the six hallmarks of cancer proposed by Hanahan et al. [29,30]: FGF18 should be included in “Self-sufficiency in growth signal” group, BCL2 in “Evading apoptosis” group, PRC1 in “Limitless replication potential” group, MMP9 in “Tissue invasion and metastasis” group, while the function of SERF1a is still unknown. These findings establish a link between our proposed molecular signature of breast cancer and the underlying capabilities acquired during the multistep development of human tumors previously categorized [29,30]. For an experimental point of view, our assay appears affordable, not time consuming, it needs FFPE tissue and it might be performed easily in almost all laboratories with the required RT-qPCR instrumentations. Importantly it was validated on a “real life” clinical setting with a set of consecutive breast cancer cases irrespectively from age, nodal and estrogen receptor status, adjuvant treatment with at least a minimum follow up of 5 years. An important limit of our approach was that the test was possible in 74.6% of the initial set of cases due to RNA degradation from FFPE tissues according to the literature regarding other signatures [19,31,32]. RNA degradation can be monitored simply evaluating the Ct values of the housekeeping genes used for normalization. Multicentric studies will be needed to evaluate possible pitfalls due to experimental inter-laboratory variability and above all increasing the reliability of the assay. A further step will be the analysis of the predictive value of the five-gene signature in ER positive population of tamoxifen alone benefit and of chemotherapy added to tamoxifen.

3. Experimental Section

3.1. Tumor Samples Enrolled in This Study

Tumor samples were obtained from routinely processed formalin-fixed, paraffin embedded sections retrieved from 350 consecutive invasive breast cancer patients with full information about tumor, adjuvant treatments, follow up, relapse, death and causes of death, treated between 1998 and 2001. In order to test our signature in a “real life” clinical setting, we decided to use consecutive non metastatic breast cancer cases irrespectively from age, nodal and estrogen receptor status, adjuvant treatment. The only requested pattern was a minimum follow up of 5 years with full clinical data. All patient information was handled in accordance with review board approved protocols and in compliance with the Helsinki declaration [33]. Hematoxylin and Eosin (H & E) sections were reviewed to identify paraffin blocks with tumor areas. Histological type and grade were assessed according to the World Health Organization criteria [34]. The detailed histological and clinical feature of each patient enrolled in this study is available in the supplementary information file. Paraffin blocks corresponding to histology sections that showed the highest relative amount of tumor vs. stroma, few infiltrating lymphoid cells and that lacked significant areas of necrosis were selected. Three 20 μm thick sections were cut, followed by one H & E control slide. The tumor area selected for the analysis was marked on this control slide to ensure greater than 70% content of neoplastic cells. Tumor areas dissected ranged from 0.5 to 1.0 cm2 wide.

3.2. Ethics Statement

The use of tissues for this study has been approved by the Ethics Committee of Centro Oncologico, ASS1 triestina & Università di Trieste, Italy. A comprehensive written informed consent was signed for the surgical treatment that produced the tissue samples and the related diagnostic procedures. All information regarding the human material used in this study was managed using anonymous numerical codes, clinical data were not used and samples were handled in compliance with the Helsinki declaration (http://www.wma.net/en/30publications/10policies/b3/).

3.3. Gene Expression Analysis on Breast Cancer Samples

3.3.1. RNA Isolation

Paraffin-embedded tumor material obtained from the 20 μm thick sections was de-paraffinized in xilene at 50 °C for 3 min and rinsed twice in absolute ethanol at room temperature. Total RNA was extracted using the RecoverAll kit (Ambion, Austin, TX, USA), including a DNase step according to the manufacturer’s recommended protocol. RNA concentration was measured by Quant-iT™ RNA kit (Invitrogen, Carlsbad, CA, USA).

3.3.2. Primers Design

Primers were designed using Primer3 software (http://simgene.com/Primer3) and are described in Table 8. Amplicons were tested by MFOLD (http://mfold.rna.albany.edu/?q=mfold) in order to avoid secondary structures within primer positions and they were tested by repeatmasker (http://www.repeatmasker.org) and primer-BLAST (http://www.ncbi.nlm.nih.gov/tools/primer-blast) for primer specificity.
Table 8

Primer sequences, slope, PCR efficiency and RSq of each of the 20 genes + 2 housekeeping genes.

Primer forwardPrimer reverseSlopeEfficiencyRSq
B2MATGAGTATGCCTGCCGTGTGAGGCATCTTCAAACCTCCATG−3.051112.7%0.992
ACTBTTGCCGACAGGATGCAGAAGGAAGGTGGACAGCGAGGCCAGGAT−3.116109.4%0.998
FBX031GAGGACATCTTCCACGAGCACAGGTAGATGCGGCGGTAGGT−3.293101.2%0.995
FGF18GGTAGTCAAGTCCGGATCAAGGTCCAGAACCTTCTCGATGAACA−3.217104.6%0.952
BCL2AGTACCTGAACCGGCACCTGCAGAGACAGCCAGGAGAAATCA−3.78783.7%0.999
IGFBP7ATGAAGTAACTGGCTGGGTGCTTGAAGCCTGTCCTTGGGAAT−3.043113.1%0.997
IDEAGCCCTTCTCCATGGAAACATACAGCTGACTTGGAAGGAGAGGT−3.149107.8%0.998
AYTL2GTTGCCCTGTCTGTCGTCTGCTTGAGGATGCAGGACAGGT−3.057112.4%0.989
ORC6LTGAAGTGCCCCTTGGACAGCAGGCCCAGTAAACACTCAAAAG−3.093110.5%0.996
MS4A7CCCTCAAAGAGAGAAACCTGGAATCAACAGGCAACACAGGATCT−3.162107.1%0.964
OSOX2CGTGTTCTCTCTGGAAACTGTTCGAACGTACCTCCTCATTGTCTGC−3.236103.7%0.998
PITRM1GGAAAATTCACACAGCAAGACAAGAGGCCGTACAAGAAGTGGT−3.192105.7%0.997
TGFb3AACTTCTGCTCAGGCCCTTGAGGCAGATGCTTCAGGGTTC−3.216104.6%0.998
PRC-1-201CCGTGTCTCGACTTCCTCCTCGTTGAGCTCCAGGTTCTCC−3.092110.6%0.991
GPR180GATTCTACGCCTGCATCCACTCCCTGCTAAGTTGTGGTGTGAA−3.076111.4%0.996
MMP9GCAAGCTGGACTCGGTCTTCCTGTGTACACCCACACCTG−2.198185.1%0.953
IGFBP6GAATCCAGGCACCTCTACCACAGTCCAGATGTCTACGGCATGG−2.821126.2%0.998
IRS1CAGTTTCCAGAAGCAGCCAGAGGAGGATTTGCTGAGGTCATTTA−3.136108.4%0.990
IL6ST210CAGTGGTCACCTCACACTCCTCTTTGTCATTTGCTTCTATTTCCA−3.071111.7%0.972
IGF1TATCAGCCCCCATCTACCAACTCTTGTTTCCTGCACTCCCTCT−3.012102.3%0.998
TNSFTCCTCAGAGAGTAGCAGCTCACACCTTGATGATTCCCAGGAGTT−2.628140.2%0.759
SERF1ACCAGGAAATTAGCAAGGGAAAGCTTGTCTGCATAGACTTCTTCTCA−2.927119.6%0.974

3.3.3. Two Step RTqPCR Analysis

Fourteen μL of total RNA was subjected to reverse transcription using SuperScript® VILO™ cDNA Synthesis kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s recommended protocol. One microlitres of cDNA was amplified in duplicate adding 10 picomoles of each primer (see Table 8 for sequence details) to the 1x QuantiFast™ SYBR® Green PCR solution (Qiagen, Hilden, Germany) in a final volume of 25 μL. Cycling conditions consisted of 5 min at 95 °C, 10 s at 95 °C, 30 s at 60 °C for a total of 40 cycles, using Stratagene Mx3000™ or ABI SDS 7000™ instruments. Plate reading was performed during the 60 °C step. For each primer set, standard curves made from serial dilutions of cDNA from MCF7 cell lines (see Table 2) were used to estimate PCR reaction efficiency (E) using the formula: E (%) = (10 [−1/slope] − 1) × 100. The expression levels of each of the 20 genes selected were normalized by GeNorm [35] using 2 housekeeping genes (B2M e ACTB) and the relative quantification was calculated by the statistical computing language R. The human breast cancer cell line MCF7 was purchased from American Type Culture Collection (ATCC HTB22; derived from a human breast adenocarcinoma). Cells were maintained in minimal essential medium (MEM) (Invitrogen/Life technologies, Villebon-sur-Yvette, France) supplemented with 2 mM l-glutamine, 1.5 g/L sodium bicarbonate, 0.1 mM nonessential aa, 1 mM pyruvate sodium, 0.01 mg/mL bovine insulin, and 10% fetal bovine serum (Thermo Scientific, Waltham, MA, USA) at 37 °C in a humidified atmosphere of 5% CO2.

3.4. Training and Validation Dataset

The biological samples dataset was split into the training and the validation dataset. The training set consists of the first 144 consecutive cases and the validation of the last 127 cases. The gene signature was developed on the training set. Once the signature has been fully specified, the validation set was accessed once and only for estimating the prediction accuracy of the signature. A multivariate stepwise Cox analysis was run on the breast cancer training set samples including the 20 selected genes. The stepwise procedure was run to select genes independently associated with DFS (p for inclusion <0.10). The overall workflow shown in Figure 1 summarizes every step starting from selection of markers from the literature since the validation of the gene signature. Reproducibility within and between blocks was assessed by performing the test in serial sections from three blocks representing three cases. We finally performed a multivariate Cox proportional-hazards analysis in a model that included treatment received (no adjuvant therapy vs. chemotherapy, hormonal therapy, or both) and the final gene Signature (both Training and Validation sets included), using the NCSS 2001 Statistical software (NCSS Inc., Kaysville, UT, USA, 2001).

3.5. Univariate and Multivariate Analysis

We performed a univariate analysis including Age, T size, Nodal status, Grading, Ki67, adjuvant treatments and the 5-gene signature, followed by a multivariate Cox proportional-hazards analysis in a model that included treatment received (no adjuvant therapy vs. chemotherapy, hormonal therapy, or both) and the 5-gene Signature (Low/Intermediate/High Risk; both Training and Validation sets included), using the NCSS 2001 Statistical software (NCSS Inc., Kaysville, UT, USA, 2001).

4. Conclusions

We developed a prognostic tool for early breast cancer based on the analysis of the relative expression level of FGF18, BCL2, PRC1, MMP9 and SERF1A in combination. Our signature has a good discriminating ability when tested on the validation set. We suppose that, after a necessary further clinical validation on a higher number of cases, it could be proposed as non expensive prognostic signature for disease free survival in breast cancer patients where the indication for adjuvant chemotherapy added to endocrine treatment is uncertain.
  31 in total

Review 1.  The hallmarks of cancer.

Authors:  D Hanahan; R A Weinberg
Journal:  Cell       Date:  2000-01-07       Impact factor: 41.582

2.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

3.  FGF18 is required for normal cell proliferation and differentiation during osteogenesis and chondrogenesis.

Authors:  Norihiko Ohbayashi; Masaki Shibayama; Yoko Kurotaki; Mayumi Imanishi; Toshihiko Fujimori; Nobuyuki Itoh; Shinji Takada
Journal:  Genes Dev       Date:  2002-04-01       Impact factor: 11.361

4.  Involvement of the FGF18 gene in colorectal carcinogenesis, as a novel downstream target of the beta-catenin/T-cell factor complex.

Authors:  Takashi Shimokawa; Yoichi Furukawa; Michihiro Sakai; Meihua Li; Nobutomo Miwa; Yu-Min Lin; Yusuke Nakamura
Journal:  Cancer Res       Date:  2003-10-01       Impact factor: 12.701

5.  A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen.

Authors:  Xiao-Jun Ma; Zuncai Wang; Paula D Ryan; Steven J Isakoff; Anne Barmettler; Andrew Fuller; Beth Muir; Gayatry Mohapatra; Ranelle Salunga; J Todd Tuggle; Yen Tran; Diem Tran; Ana Tassin; Paul Amon; Wilson Wang; Wei Wang; Edward Enright; Kimberly Stecker; Eden Estepa-Sabal; Barbara Smith; Jerry Younger; Ulysses Balis; James Michaelson; Atul Bhan; Karleen Habin; Thomas M Baer; Joan Brugge; Daniel A Haber; Mark G Erlander; Dennis C Sgroi
Journal:  Cancer Cell       Date:  2004-06       Impact factor: 31.743

6.  Metastatic potential of T1 breast cancer can be predicted by the 70-gene MammaPrint signature.

Authors:  Stella Mook; Michael Knauer; Jolien M Bueno-de-Mesquita; Valesca P Retel; Jelle Wesseling; Sabine C Linn; Laura J Van't Veer; Emiel J Rutgers
Journal:  Ann Surg Oncol       Date:  2010-01-22       Impact factor: 5.344

7.  A gene-expression signature as a predictor of survival in breast cancer.

Authors:  Marc J van de Vijver; Yudong D He; Laura J van't Veer; Hongyue Dai; Augustinus A M Hart; Dorien W Voskuil; George J Schreiber; Johannes L Peterse; Chris Roberts; Matthew J Marton; Mark Parrish; Douwe Atsma; Anke Witteveen; Annuska Glas; Leonie Delahaye; Tony van der Velde; Harry Bartelink; Sjoerd Rodenhuis; Emiel T Rutgers; Stephen H Friend; René Bernards
Journal:  N Engl J Med       Date:  2002-12-19       Impact factor: 91.245

Review 8.  Hallmarks of cancer: the next generation.

Authors:  Douglas Hanahan; Robert A Weinberg
Journal:  Cell       Date:  2011-03-04       Impact factor: 41.582

9.  PRC1 is a microtubule binding and bundling protein essential to maintain the mitotic spindle midzone.

Authors:  Cristiana Mollinari; Jean-Philippe Kleman; Wei Jiang; Guy Schoehn; Tony Hunter; Robert L Margolis
Journal:  J Cell Biol       Date:  2002-06-24       Impact factor: 10.539

10.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.

Authors:  Jo Vandesompele; Katleen De Preter; Filip Pattyn; Bruce Poppe; Nadine Van Roy; Anne De Paepe; Frank Speleman
Journal:  Genome Biol       Date:  2002-06-18       Impact factor: 13.583

View more
  5 in total

1.  The changing faces of corticotroph cell adenomas: the role of prohormone convertase 1/3.

Authors:  Alberto Righi; Marco Faustini-Fustini; Luca Morandi; Valentina Monti; Sofia Asioli; Diego Mazzatenta; Antonella Bacci; Maria Pia Foschini
Journal:  Endocrine       Date:  2016-08-04       Impact factor: 3.633

2.  Global gene regulation during activation of immunoglobulin class switching in human B cells.

Authors:  Youming Zhang; David J Fear; Saffron A G Willis-Owen; William O Cookson; Miriam F Moffatt
Journal:  Sci Rep       Date:  2016-11-29       Impact factor: 4.379

3.  The precision relationships between eight GWAS-identified genetic variants and breast cancer in a Chinese population.

Authors:  Yazhen Chen; Fangmeng Fu; Yuxiang Lin; Lin Qiu; Minjun Lu; Jiantang Zhang; Wei Qiu; Peidong Yang; Na Wu; Meng Huang; Chuan Wang
Journal:  Oncotarget       Date:  2016-11-15

Review 4.  A survey on computer aided diagnosis for ocular diseases.

Authors:  Zhuo Zhang; Ruchir Srivastava; Huiying Liu; Xiangyu Chen; Lixin Duan; Damon Wing Kee Wong; Chee Keong Kwoh; Tien Yin Wong; Jiang Liu
Journal:  BMC Med Inform Decis Mak       Date:  2014-08-31       Impact factor: 2.796

5.  Genome-wide association analysis in East Asians identifies breast cancer susceptibility loci at 1q32.1, 5q14.3 and 15q26.1.

Authors:  Qiuyin Cai; Ben Zhang; Hyuna Sung; Siew-Kee Low; Sun-Seog Kweon; Wei Lu; Jiajun Shi; Jirong Long; Wanqing Wen; Ji-Yeob Choi; Dong-Young Noh; Chen-Yang Shen; Keitaro Matsuo; Soo-Hwang Teo; Mi Kyung Kim; Ui Soon Khoo; Motoki Iwasaki; Mikael Hartman; Atsushi Takahashi; Kyota Ashikawa; Koichi Matsuda; Min-Ho Shin; Min Ho Park; Ying Zheng; Yong-Bing Xiang; Bu-Tian Ji; Sue K Park; Pei-Ei Wu; Chia-Ni Hsiung; Hidemi Ito; Yoshio Kasuga; Peter Kang; Shivaani Mariapun; Sei Hyun Ahn; Han Sung Kang; Kelvin Y K Chan; Ellen P S Man; Hiroji Iwata; Shoichiro Tsugane; Hui Miao; Jiemin Liao; Yusuke Nakamura; Michiaki Kubo; Ryan J Delahanty; Yanfeng Zhang; Bingshan Li; Chun Li; Yu-Tang Gao; Xiao-Ou Shu; Daehee Kang; Wei Zheng
Journal:  Nat Genet       Date:  2014-07-20       Impact factor: 38.330

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.