| Literature DB >> 22022496 |
Anna Dvorkin-Gheva1, John A Hassell.
Abstract
The occurrence of large publically available repositories of human breast tumor gene expression profiles provides an important resource to discover new breast cancer biomarkers and therapeutic targets. For example, knowledge of the expression of the estrogen and progesterone hormone receptors (ER and PR), and that of the ERBB2 in breast tumor samples enables choice of therapies for the breast cancer patients that express these proteins. Identifying new biomarkers and therapeutic agents affecting the activity of signaling pathways regulated by the hormone receptors or ERBB2 might be accelerated by knowledge of their expression levels in large gene expression profiling data sets. Unfortunately, the status of these receptors is not invariably reported in public databases of breast tumor gene expression profiles. Attempts have been made to employ a single probe set to identify ER, PR and ERBB2 status, but the specificity or sensitivity of their prediction is low. We enquired whether estimation of ER, PR and ERBB2 status of profiled tumor samples could be improved by using multiple probe sets representing these three genes and others with related expression.We used 8 independent datasets of human breast tumor samples to define gene expression signatures comprising 24, 51 and 14 genes predictive of ER, PR and ERBB2 status respectively. These signatures, as demonstrated by sensitivity and specificity measures, reliably identified hormone receptor and ERBB2 expression in breast tumors that had been previously determined using protein and DNA based assays. Our findings demonstrate that gene signatures can be identified which reliably predict the expression status of the estrogen and progesterone hormone receptors and that of ERBB2 in publically available gene expression profiles of breast tumor samples. Using these signatures to query transcript profiles of breast tumor specimens may enable discovery of new biomarkers and therapeutic targets for particular subtypes of breast cancer.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22022496 PMCID: PMC3192779 DOI: 10.1371/journal.pone.0026023
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Selecting gene signature predictive of ER status based on sensitivity and specificity.
The cutoff is based on Spearman rank correlation coefficients. The number of probe sets in each signature is marked by the number under the lowest curve. Black filled circles – specificity; gray circles – sensitivity; black line – sum of specificity and sensitivity. The optimal number of probe sets was 35, with Spearman rank correlation coefficient cutoff set at 0.43. Gray line and “*” indicate the sum of specificity and sensitivity of the prediction obtained by using a single “best probe set” (“205225_at”).
Gene signature predictive of ER status.
| Gene Symbol | Correlation Coefficient | Gene Title |
| ADCY9 | 0.44 | adenylatecyclase 9 |
| AMFR | 0.44 | autocrine motility factor receptor |
| ANXA9 | 0.43 | annexin A9 |
| 0.45 | ||
| C6orf97 | 0.45 | chromosome 6 open reading frame 97 |
| CA12 | 0.48 | carbonic anhydrase XII |
| 0.48 | ||
| 0.47 | ||
| 0.47 | ||
| 0.47 | ||
| CELSR1 | 0.43 | cadherin, EGF LAG seven-pass G-type receptor 1 (flamingo homolog, Drosophila) |
| CYP2B6 /// | 0.45 | cytochrome P450, family 2, subfamily B, polypeptides 6, 7 |
| CYP2B7P1 | ||
| ESR1 | 0.50 | estrogen receptor 1 |
| FAM176B | 0.46 | family with sequence similarity 176, member B |
| 0.43 | ||
| GAMT | 0.45 | guanidinoacetate N-methyltransferase |
| GATA3 | 0.45 | GATA binding protein 3 |
| 0.48 | ||
| 0.47 | ||
| GFRA1 | 0.45 | GDNF family receptor alpha 1 |
| GREB1 | 0.46 | growth regulation by estrogen in breast cancer 1 |
| IL6ST | 0.44 | interleukin 6 signal transducer (gp130, oncostatin M receptor) |
| 0.44 | ||
| KCNK15 | 0.44 | potassium channel, subfamily K, member 15 |
| KDM4B | 0.45 | lysine (K)-specific demethylase 4B |
| 0.46 | ||
| 0.43 | ||
| SCCPDH | 0.43 | saccharopine dehydrogenase (putative) |
| SCUBE2 | 0.46 | signal peptide, CUB domain, EGF-like 2 |
| LIV1 (SLC39A6) | 0.46 | solute carrier family 39 (zinc transporter), member 6 |
| SSH3 | 0.44 | slingshot homolog 3 (Drosophila) |
| STC2 | 0.44 | stanniocalcin 2 |
| TBC1D9 | 0.43 | TBC1 domain family, member 9 (with GRAM domain) |
| TFF1 | 0.44 | trefoil factor 1 |
| Unknown | 0.45 | Not annotated |
Each row in the coefficient column represents a probe set. Genes, whose levels of expression were previously reported to correlate with ER status are marked in bold. The rows were sorted alphabetically according to the Gene Symbol. For detailed information on the probe sets see Table S1.
Correlation of microarray–based expression profiling data with routinely established ER status.
| Total | ER status defined by predictor | Clinical ER status | ||||
| Negative | Positive | p-value | ||||
| Training | GSE3494 | 247 | Negative | 31 | 25 | <2.2·10–16 |
| Positive | 3 | 188 | ||||
| GSE3494 | 247 | Negative | 29 | 22 | <2.2·10–16 | |
| Positive | 5 | 191 | ||||
| Validation | GSE2034 | 286 | Negative | 68 | 26 | <2.2·10–16 |
| Positive | 9 | 183 | ||||
| GSE7390 | 198 | Negative | 52 | 10 | <2.2·10–16 | |
| Positive | 12 | 124 | ||||
| GSE2603 | 97 | Negative | 41 | 2 | <2.2·10–16 | |
| Positive | 0 | 54 | ||||
| GSE20271 | 144 | Negative | 54 | 16 | 4.227·10–13 | |
| Positive | 13 | 61 | ||||
| GSE20194 | 278 | Negative | 103 | 17 | <2.2·10–16 | |
| Positive | 11 | 147 | ||||
*Fisher's exact test.
**The analysis was performed by using the “best probe set” (“205225_at”). The rest of analyses were performed by using the 24-gene ER signature.
Figure 2ER status determination: sensitivity (‘+’) and specificity (‘-’) obtained with two different microarray-based methods.
The improved feature is highlighted by gray background. * Training set.
Figure 3Selecting set of genes predictive of ERBB2 status based on sensitivity and specificity.
Cutoff is based on Spearman rank correlation coefficients. The number of probe sets in each signature is marked by the number under the lowest curve. Black filled circles – specificity; gray circles – sensitivity; black line – sum of specificity and sensitivity. The optimal number of probe sets was 19, with Spearman correlation coefficient cutoff set at 0.35. Gray line and “*” indicate the sum of specificity and sensitivity of the prediction obtained by using a single “best probe set” (“203497_at”).
Gene signature predictive of ERBB2 status.
| Gene Symbol | Correlation Coefficient | Gene description |
| Positive Spearman correlation | ||
| CRK7 (CDK12) | 0.38 | cyclin-dependent kinase 12 |
| ERBB2 | 0.42 | v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian) |
| 0.39 | ||
| F2RL1 | 0.35 | coagulation factor II (thrombin) receptor-like 1 |
| GRB7 | 0.43 | growth factor receptor-bound protein 7 |
| IDI1 | 0.37 | isopentenyl-diphosphate delta isomerase 1 |
| ITGB6 | 0.36 | integrin, beta 6 |
| 0.35 | ||
| PERLD1 | 0.37 | post-GPI attachment to proteins 3 |
| 0.38 | ||
| PPARBP | 0.45 | mediator complex subunit 1 |
| 0.39 | ||
| SEC63 | 0.37 | SEC63 homolog (S. cerevisiae) |
| STARD3 | 0.37 | StAR-related lipid transfer (START) domain containing 3 |
| TRIM26 | 0.36 | tripartite motif-containing 26 |
| Negative Spearman correlation | ||
| DIRAS2 | -0.36 | DIRAS family, GTP-binding RAS-like 2 |
| DUSP24 | -0.36 | serine/threonine/tyrosine interacting-like 1 |
| UBTF | -0.36 | upstream binding transcription factor, RNA polymerase I |
| Unknown | -0.37 | Not annotated |
Each row in the correlation coefficient column represents a probe set. Genes, within the borders of the ERBB2 amplicon are marked in bold. The list of genes is divided into genes with positive and negative (-) correlation coefficients. For detailed information on the probe sets see Table S2.
Correlation of microarray–based expression profiling data with routinely established ERBB2 status.
| Total | ERBB2 status defined by predictor | Clinical ERBB2 status | ||||
| Negative | Positive | p-value | ||||
| Training | GSE2603 | 88 | Negative | 75 | 2 | 1.712·10−6 |
| Positive | 4 | 7 | ||||
| GSE2603 | 88 | Negative | 78 | 1 | <4.4·10−8 | |
| Positive | 2 | 7 | ||||
| GSE20271 | 144 | Negative | 115 | 9 | 2.287·10−8 | |
| Positive | 7 | 13 | ||||
| GSE20271 | 144 | Negative | 115 | 13 | <5.2·10−5 | |
| Positive | 7 | 9 | ||||
| Validation | GSE20194 | 278 | Negative | 218 | 14 | <2.2·10−16 |
| Positive | 1 | 45 | ||||
| GSE16446 | 93 | Negative | 61 | 5 | <2.2·10−16 | |
| Positive | 1 | 26 | ||||
*Fisher's exact test.
**The analysis was performed by using the “best probe set” (“203497_at”). The rest of analyses were performed by using the 14-gene ERBB2 signature.
Figure 4ERBB2 status determination: sensitivity (‘+’) and specificity (‘-’) obtained with two different microarray-based methods.
The improved feature is highlighted by gray background. Datasets GSE2603, GSE20271 and GSE20194 were profiled on HG-U133A GeneChips; GSE16446 was profiled on HG-U133 Plus 2.0 GeneChips. * Training set.
Figure 5Selecting set of genes predictive of PR status based on sensitivity and specificity.
The cutoff is based on Spearman rank correlation coefficients. The number of probe sets in each signature is marked by the number under the lowest curve. Black filled circles – specificity; gray circles – sensitivity; black line – sum of specificity and sensitivity. The optimal number of probe sets is 61, with Spearman correlation coefficient cutoff set at 0.38. Gray line and “*” indicate the sum of specificity and sensitivity of the prediction obtained by using a single “best probe set” (“219197_s_at”).
Gene signature predictive of PR status.
| Gene Symbol | Correlation Coefficient | Gene Title |
| Positive Spearman correlation | ||
| BBS1 | 0.40 | Bardet-Biedl syndrome 1 |
| 0.40 | ||
| BCAM | 0.40 | basal cell adhesion molecule (Lutheran blood group) |
|
| 0.39 | carbonic anhydrase XII |
| 0.40 | ||
| CASC1 | 0.39 | cancer susceptibility candidate 1 |
|
| 0.41 | family with sequence similarity 176, member B |
| 0.44 | ||
|
| 0.40 | guanidinoacetate N-methyltransferase |
|
| 0.39 | GATA binding protein 3 |
| 0.41 | ||
|
| 0.39 | GDNF family receptor alpha 1 |
| GLI3 | 0.39 | GLI family zinc finger 3 |
| HPN | 0.39 | hepsin |
|
| 0.40 | interleukin 6 signal transducer (gp130, oncostatin M receptor) |
| 0.41 | ||
|
| 0.41 | lysine (K)-specific demethylase 4B |
| 0.42 | ||
| LAMB2 | 0.40 | laminin, beta 2 (laminin S) |
| LRRC17 | 0.39 | leucine rich repeat containing 17 |
| LZTFL1 | 0.39 | leucine zipper transcription factor-like 1 |
| MAGED2 | 0.39 | melanoma antigen family D, 2 |
| MAPT | 0.39 | microtubule-associated protein tau |
| 0.40 | ||
| PDE4A | 0.38 | phosphodiesterase 4A, cAMP-specific (phosphodiesterase E2 dunce homolog, Drosophila) |
| PGR | 0.41 | progesterone receptor |
|
| 0.44 | signal peptide, CUB domain, EGF-like 2 |
|
| 0.38 | solute carrier family 39 (zinc transporter), member 6 |
| STARD13 | 0.39 | StAR-related lipid transfer (START) domain containing 13 |
|
| 0.38 | stanniocalcin 2 |
| WDR19 | 0.40 | WD repeat domain 19 |
|
| 0.40 | Not annotated |
| Negative Spearman correlation | ||
| AURKA | −0.40 | aurora kinase A |
| −0.38 | ||
| BUB1 | −0.41 | budding uninhibited by benzimidazoles 1 homolog (yeast) |
| C16orf61 | −0.38 | chromosome 16 open reading frame 61 |
| CCNA2 | −0.40 | cyclin A2 |
| CDC20 | −0.40 | cell division cycle 20 homolog (S. cerevisiae) |
| CDCA8 | −0.39 | cell division cycle associated 8 |
| CENPA | −0.38 | centromere protein A |
| CENPN | −0.38 | centromere protein N |
| CEP55 | −0.39 | centrosomal protein 55 kDa |
| DBF4 | −0.44 | DBF4 homolog (S. cerevisiae) |
| DDX39 | −0.39 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 39 |
| DLGAP5 | −0.39 | discs, large (Drosophila) homolog-associated protein 5 |
| GATAD2A | −0.41 | GATA zinc finger domain containing 2A |
| GTSE1 | −0.38 | G-2 and S-phase expressed 1 |
| HJURP | −0.39 | Holliday junction recognition protein |
| KIF2C | −0.41 | kinesin family member 2C |
| −0.41 | ||
| KPNA2 | −0.41 | karyopherin alpha 2 (RAG cohort 1, importin alpha 1) |
| LAD1 | −0.41 | ladinin 1 |
| LPIN1 | −0.40 | lipin 1 |
| MCAM | −0.39 | melanoma cell adhesion molecule |
| MELK | −0.40 | maternal embryonic leucine zipper kinase |
| MKI67 | −0.39 | antigen identified by monoclonal antibody Ki-67 |
| OR7E37P | −0.39 | olfactory receptor, family 7, subfamily E, member 37 pseudogene |
| PSME4 | −0.42 | proteasome (prosome, macropain) activator subunit 4 |
| PTTG1 | −0.40 | pituitary tumor-transforming 1 |
| SLC7A5 | −0.40 | solute carrier family 7 (cationic amino acid transporter, y+system), member 5 |
| TTK | −0.40 | TTK protein kinase |
Each row in the correlation coefficient column represents a probe set. Genes, whose levels of expression are reported to correlate with PR are marked in bold. Genes that occur in the signature predictive of ER status are marked in italics. Those genes whose levels of expression have been reported in literature to correlate with ER status are marked by an asterisk. The list of genes is divided into those with positive and negative correlation (-) coefficients. For detailed information on the probe sets see Table S3.
Correlation of microarray–based expression profiling data with routinely established PR status.
| Total | PR status defined by predictor | Clinical PR status | ||||
| Negative | Positive | p-value | ||||
| Training | GSE3494 | 251 | Negative | 45 | 31 | 2.3·10−16 |
| Positive | 16 | 159 | ||||
| GSE3494 | 251 | Negative | 29 | 17 | <3.3·10−10 | |
| Positive | 32 | 173 | ||||
| Validation | GSE20271 | 144 | Negative | 63 | 15 | 6.1·10−12 |
| Positive | 16 | 50 | ||||
| GSE20194 | 278 | Negative | 107 | 22 | <2.2·10−16 | |
| Positive | 50 | 99 | ||||
| GSE9195 | 79 | Negative | 9 | 24 | 0.1484 | |
| Positive | 6 | 40 | ||||
*Fisher's exact test.
**The analysis was performed by using the “best probe set” (“219197_s_at”). The rest of analyses were performed by using the 51-gene PR signature.
Figure 6PR status determination: sensitivity (‘+’) and specificity (‘-’) obtained with two different microarray-based methods.
The improved feature is highlighted by gray background. Datasets GSE3494, GSE20271 and GSE20194 were profiled on HG-U133A GeneChips; GSE9195 was profiled on HG-U133 Plus 2.0GeneChips. * Training set.
Sources of the samples and methods used to obtain the clinical information about the samples.
| Total number of profiled samples | ER assessment | PR assessment | ERBB2 assessment | ||||
| IHC | EIA | Other assay | IHC | Biochemical assay | IHC or FISH | ||
|
| 286 | 9 | 277 | - | - | - | - |
|
| 251 | - | - | 247 bioche-mical assay | 251 | - | |
|
| 198 | 198 | - | - | - | - | - |
|
| 121 | 97 either IHC, EIA or Biochemical assay | - | - | 88 IHC | ||
|
| 144 | 144 | - | - | 144 | - | 144 either IHC or FISH |
|
| 278 | 278 | - | - | 278 | - | 278 either IHC or FISH |
|
| 120 | - | - | - | - | - | 93 FISH |
|
| 79 | - | - | 79 Ligand binding | |||
*enzymatic immunoassay – EIA [64].
Figure 7Algorithm for finding the gene signatures predictive of ER, PR or ERBB2 status.
The method was used on HG-U133A GeneChip arrays containing 22,283 probe sets. ER – estrogen receptor; PR – progesterone receptor.