| Literature DB >> 18304324 |
Lei Xu1, Aik Choon Tan, Raimond L Winslow, Donald Geman.
Abstract
BACKGROUND: There is an urgent need for new prognostic markers of breast cancer metastases to ensure that newly diagnosed patients receive appropriate therapy. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of developing distant metastases. However, due to the small sample sizes of individual studies, the overlap among signatures is almost zero and their predictive power is often limited. Integrating microarray data from multiple studies in order to increase sample size is therefore a promising approach to the development of more robust prognostic tests.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18304324 PMCID: PMC2409450 DOI: 10.1186/1471-2105-9-125
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Training data sets: lymph-node-negative patients with no adjuvant treatment
| Miller [25] | 106 | 92 | 14 |
| Sotiriou [11] | 43 | 30 | 13 |
| Wang [7] | 209 | 114 | 95 |
| Total | 358 | 236 | 122 |
Figure 1Choosing size of the signature. The relationship between the number of features in a prognostic signature and the specificity at 90% sensitivity of the corresponding prognostic test, evaluated by 40-fold cross-validation. We select m= 80, the smallest value that achieves roughly maximum specificity at the 90% sensitivity level. The specificity observed on the validation set is in fact higher.
Genes in the identified prognostic signature. For each probe set the first column lists the subset of the eighty pairs which contain it. The pairs are ordered from 1 to 80 by their scores.
| 1, 43 | 91816_f_at | RKHD1 | ring finger and KH domain containing 1 |
| 1, 6, 73 | 204641_at | NEK2 | NIMA (never in mitosis gene a)-related kinase 2 |
| 2 | 213139_at | SNAI2 | snail homolog 2 (Drosophila) |
| 2, 4, 9, 33 | 212188_at | KCTD12 | potassium channel tetramerisation domain containing 12 |
| 3 | 212022_s_at | MKI67 | antigen identified by monoclonal antibody Ki-67 |
| 3, 61, 80 | 219716_at | APOL6 | apolipoprotein L, 6 |
| 4 | 205264_at | CD3EAP | CD3e molecule, epsilon associated protein |
| 5 | 206687_s_at | PTPN6 | protein tyrosine phosphatase, non-receptor type 6 |
| 5, 67 | 218009_s_at | PRC1 | protein regulator of cytokinesis 1 |
| 6, 35, 39, 55 | 219579_at | RAB3IL1 | RAB3A interacting protein (rabin3)-like 1 |
| 7 | 221824_s_at | MARCH8 | membrane-associated ring finger (C3HC4) 8 |
| 7 | 209574_s_at | C18orf1 | chromosome 18 open reading frame 1 |
| 8 | 210199_at | CRYAA | crystallin, alpha A |
| 8, 24, 26, 31 | 219493_at | SHCBP1 | SHC SH2-domain binding protein 1 |
| 9 | 204177_s_at | KLHL20 | kelch-like 20 (Drosophila) |
| 10, 34 | 203010_at | STAT5A | signal transducer and activator of transcription 5A |
| 10 | 212747_at | ANKS1A | ankyrin repeat and sterile alpha motif domain containing 1A |
| 11, 19, 21 | 205034_at | CCNE2 | cyclin E2 |
| 11, 65 | 217427_s_at | HIRA | HIR histone cell cycle regulation defective homolog A (S. cerevisiae) |
| 12, 46, 54, 74 | 222077_s_at | RACGAP1 | Rac GTPase activating protein 1 |
| 12, 62 | 36545_s_at | SFI1 | Sfi1 homolog, spindle assembly associated (yeast) |
| 13, 17, 72 | 218883_s_at | MLF1IP | MLF1 interacting protein |
| 13 | 203332_s_at | INPP5D | inositol polyphosphate-5-phosphatase, 145kDa |
| 14, 15 | 211584_s_at | NPAT | nuclear protein, ataxia-telangiectasia locus |
| 14 | 219512_at | C20orf172 | chromosome 20 open reading frame 172 |
| 15 | 221193_s_at | ZCCHC10 | zinc finger, CCHC domain containing 10 |
| 16 | 221521_s_at | GINS2 | GINS complex subunit 2 (Psf2 homolog) |
| 16 | 209671_x_at | TRA@///TRAC | T cell receptor alpha locus///T cell receptor alpha locus |
| 17 | 208952_s_at | LARP5 | La ribonucleoprotein domain family, member 5 |
| 18, 30 | 218726_at | DKFZp762E1312 | hypothetical protein DKFZp762E1312 |
| 18, 51 | 211581_x_at | LST1 | leukocyte specific transcript 1 |
| 19 | 221273_s_at | DKFZP761H1710 | hypothetical protein DKFZp761H1710 |
| 20 | 205395_s_at | MRE11A | MRE11 meiotic recombination 11 homolog A (S. cerevisiae) |
| 20, 59 | 214973_x_at | IGHD | immunoglobulin heavy constant delta |
| 21, 27 | 211881_x_at | IGLJ3 | immunoglobulin lambda joining 3 |
| 22 | 202602_s_at | HTATSF1 | HIV-1 Tat specific factor 1 |
| 22 | 218143_s_at | SCAMP2 | secretory carrier membrane protein 2 |
| 23 | 212911_at | DNAJC16 | DnaJ (Hsp40) homolog, subfamily C, member 16 |
| 23 | 204817_at | ESPL1 | extra spindle poles like 1 (S. cerevisiae) |
| 24 | 215783_s_at | ALPL | alkaline phosphatase, liver/bone/kidney |
| 25, 38, 39, 44, 52, 71 | 204825_at | MELK | maternal embryonic leucine zipper kinase |
| 25 | 213689_x_at | RPL5 | Ribosomal protein L5 |
| 26 | 206545_at | CD28 | CD28 molecule |
| 27 | 206364_at | KIF14 | kinesin family member 14 |
| 28, 60, 61 | 208079_s_at | AURKA | aurora kinase A |
| 28 | 214955_at | TMPRSS6 | transmembrane protease, serine 6 |
| 29 | 210966_x_at | LARP1 | La ribonucleoprotein domain family, member 1 |
| 29 | 218830_at | RPL26L1 | ribosomal protein L26-like 1 |
| 30 | 204498_s_at | ADCY9 | adenylate cyclase 9 |
| 31 | 206211_at | SELE | selectin E (endothelial adhesion molecule 1) |
| 32, 34, 69 | 201890_at | RRM2 | ribonucleotide reductase M2 polypeptide |
| 32 | 219298_at | ECHDC3 | enoyl Coenzyme A hydratase domain containing 3 |
| 33 | 204847_at | ZBTB11 | zinc finger and BTB domain containing 11 |
| 35, 62 | 203214_x_at | CDC2 | cell division cycle 2, G1 to S and G2 to M |
| 36 | 204605_at | CGRRF1 | cell growth regulator with ring finger domain 1 |
| 36 | 211251_x_at | NFYC | nuclear transcription factor Y, gamma |
| 37, 65 | 213008_at | KIAA1794 | KIAA1794 |
| 37, 73 | 210042_s_at | CTSZ | cathepsin Z |
| 38 | 203595_s_at | IFIT5 | interferon-induced protein with tetratricopeptide repeats 5 |
| 40 | 221529_s_at | PLVAP | plasmalemma vesicle associated protein |
| 40 | 202114_at | SNX2 | sorting nexin 2 |
| 41 | 211779_x_at | AP2A2 | adaptor-related protein complex 2, alpha 2 subunit |
| 41, 63 | 202324_s_at | ACBD3 | acyl-Coenzyme A binding domain containing 3 |
| 42, 57 | 201821_s_at | TIMM17A | translocase of inner mitochondrial membrane 17 homolog A (yeast) |
| 42 | 201551_s_at | LAMP1 | lysosomal-associated membrane protein 1 |
| 43 | 48808_at | DHFR | dihydrofolate reductase |
| 44 | 211643_x_at | LOC651961 | Myosin-reactive immunoglobulin light chain variable region |
| 45 | 210396_s_at | LOC440354 | PI-3-kinase-related kinase SMG-1 pseudogene |
| 45 | 201070_x_at | SF3B1 | splicing factor 3b, subunit 1, 155kDa |
| 46 | 207391_s_at | PIP5K1A | phosphatidylinositol-4-phosphate 5-kinase, type I, alpha |
| 47 | 200800_s_at | HSPA1A | heat shock 70 kDa protein 1A |
| 47 | 201009_s_at | TXNIP | thioredoxin interacting protein |
| 48 | 203530_s_at | STX4 | syntaxin 4 |
| 48, 50 | 218085_at | CHMP5 | chromatin modifying protein 5 |
| 49, 68, 70 | 219555_s_at | C16orf60 | chromosome 16 open reading frame 60 |
| 49 | 210419_at | BARX2 | BarH-like homeobox 2 |
| 50 | 214119_s_at | FKBP1A | FK506 binding protein 1A, 12 kDa |
| 51, 58 | 203362_s_at | MAD2L1 | MAD2 mitotic arrest deficient-like 1 (yeast) |
| 52 | 218910_at | TMEM16K | transmembrane protein 16K |
| 53 | 208838_at | KIAA0829 | KIAA0829 protein |
| 53 | 212081_x_at | BAT2 | HLA-B associated transcript 2 |
| 54 | 202115_s_at | NOC2L | nucleolar complex associated 2 homolog (S. cerevisiae) |
| 55 | 209714_s_at | CDKN3 | cyclin-dependent kinase inhibitor 3 (CDK2-associated dual specificity phosphatase) |
| 56 | 205701_at | IPO8 | importin 8 |
| 56 | 205063_at | SIP1 | survival of motor neuron protein interacting protein 1 |
| 57 | 200918_s_at | SRPR | signal recognition particle receptor ('docking protein') |
| 58 | 212527_at | D15Wsu75e | DNA segment, Chr 15, Wayne State University 75, expressed |
| 59 | 204244_s_at | DBF4 | DBF4 homolog (S. cerevisiae) |
| 60 | 214508_x_at | CREM | cAMP responsive element modulator |
| 63 | 200787_s_at | PEA15 | phosphoprotein enriched in astrocytes 15 |
| 64 | 203764_at | DLG7 | discs, large homolog 7 (Drosophila) |
| 64 | 205877_s_at | ZC3H7B | zinc finger CCCH-type containing 7B |
| 66 | 200848_at | AHCYL1 | S-adenosylhomocysteine hydrolase-like 1 |
| 66 | 201091_s_at | CBX3 | chromobox homolog 3 (HP1 gamma homolog, Drosophila) |
| 67 | 64064_at | GIMAP5 | GTPase, IMAP family member 5 |
| 68 | 211649_x_at | IGHG1 | Immunoglobulin heavy constant gamma 1 (G1m marker) |
| 69 | 204398_s_at | EML2 | echinoderm microtubule associated protein like 2 |
| 70 | 220433_at | PRRG3 | proline rich Gla (G-carboxyglutamic acid) 3 (transmembrane) |
| 71 | 219169_s_at | TFB1M | transcription factor B1, mitochondrial |
| 72 | 34689_at | TREX1 | three prime repair exonuclease 1 |
| 74 | 212604_at | MRPS31 | mitochondrial ribosomal protein S31 |
| 75 | 213907_at | EEF1E1 | Eukaryotic translation elongation factor 1 epsilon 1 |
| 75 | 209622_at | STK16 | serine/threonine kinase 16 |
| 76 | 209716_at | CSF1 | colony stimulating factor 1 (macrophage) |
| 76 | 219575_s_at | peptide deformylase (mitochondrial) | |
| 77 | 219328_at | DDX31 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 31 |
| 77 | 213121_at | SNRP70 | small nuclear ribonucleoprotein 70 kDa polypeptide (RNP antigen) |
| 78 | 218870_at | ARHGAP15 | Rho GTPase activating protein 15 |
| 78 | 219105_x_at | ORC6L | origin recognition complex, subunit 6 like (yeast) |
| 79 | 216510_x_at | IGHA1 | immunoglobulin heavy constant alpha 1 |
| 79 | 215207_x_at | YDD19 | YDD19 protein |
| 80 | 219918_s_at | ASPM | asp (abnormal spindle)-like, microcephaly associated (Drosophila) |
Figure 2The heat map of the 80 signature gene pairs. The Wang data set is used to illustrate the gene expression values of the signature genes. A heat map is generated using the matrix2png software [34]. There are 80 rows corresponding to the 80 gene pairs; the displayed intensities are the differences between the expression values of the two genes in each pair. The expression value for each difference is normalized across the samples to zero mean and one standard deviation (SD) for visualization purposes. Differences with expression levels greater than the mean are colored in red and those below the mean are colored in green. The scale indicates the number of SDs above or below the mean.
Figure 3The Kaplan-Meier analysis. Kaplan-Meier analysis of the probability of remaining free of distant metastases among 159 Pawitan patients between the good-outcome group and the poor-outcome group. The LRT is based on the integrated data in (A) and the single, Wang data set in (B). CI denotes confidence interval and the p-value is calculated by the log-rank test.
Test results on Pawitan data (154 patients)
| Sotiriou | 43 | 51.4 | 47.1 |
| Miller | 106 | 100.0 | 15.1 |
| Wang | 209 | 94.3 | 10.1 |
| Integrated | 358 | 88.6 | 54.6 |