| Literature DB >> 23565194 |
Yingpu Tian1, Baozhen Chen, Pengfei Guan, Yujia Kang, Zhongxian Lu.
Abstract
Identification of novel cancer genes for molecular therapy and diagnosis is a current focus of breast cancer research. Although a few small gene sets were identified as prognosis classifiers, more powerful models are still needed for the definition of effective gene sets for the diagnosis and treatment guidance in breast cancer. In the present study, we have developed a novel statistical approach for systematic analysis of intrinsic correlations of gene expression between development and tumorigenesis in mammary gland. Based on this analysis, we constructed a predictive model for prognosis in breast cancer that may be useful for therapy decisions. We first defined developmentally associated genes from a mouse mammary gland epithelial gene expression database. Then, we found that the cancer modulated genes were enriched in this developmentally associated genes list. Furthermore, the developmentally associated genes had a specific expression profile, which associated with the molecular characteristics and histological grade of the tumor. These result suggested that the processes of mammary gland development and tumorigenesis share gene regulatory mechanisms. Then, the list of regulatory genes both on the developmental and tumorigenesis process was defined an 835-member prognosis classifier, which showed an exciting ability to predict clinical outcome of three groups of breast cancer patients (the predictive accuracy 64∼72%) with a robust prognosis prediction (hazard ratio 3.3∼3.8, higher than that of other clinical risk factors (around 2.0-2.8)). In conclusion, our results identified the conserved molecular mechanisms between mammary gland development and neoplasia, and provided a unique potential model for mining unknown cancer genes and predicting the clinical status of breast tumors. These findings also suggested that developmental roles of genes may be important criteria for selecting genes for prognosis prediction in breast cancer.Entities:
Mesh:
Year: 2013 PMID: 23565194 PMCID: PMC3614930 DOI: 10.1371/journal.pone.0060131
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Definition of mammary gland developmentally associated genes.
Overview of data processing for defining the developmentally associated gene subset was discribled in A. The probes were filtered systematically with different cutoffs:p value of gene expression among different time points and the optimal fold of maximum/minimum expression of a gene at different developmental time points, which should have a maximum Odds ratio of literature-based mammary gland-cycle associated genes in developmentally associated and non-developmentally associated genes subset. A higher Odds ratio means that a greater number of developmental genes were correctly classified. The figure B shows the curve of Odd ratios in the developmentally associated and non-developmentally associated genes subset defined by a cutoff with different ratio of maximum to minimum expression of each gene at different time points in the developmental progress. C. Fisher exact test to assess the frequency of validated cancer gene expression in the group of mammary gland developmentally associated genes. The validated cancer genes were obtained from previously published papers (File S5).
Figure 2Identification of genes associated with the developmental phases of growth, lactation, and involution among the mammary gland developmentally associated gene subset.
A. The developmentally associated genes were clustered into three groups by Principal Component Analysis. Expression profiles of genes in mammary pregnancy cycle are represented as dots in PC1 (1st principal component axis) and PC2 (2nd principal component axis). All probe sets were grouped into three groups: growth (PC1>0), involution (PC1<0&PC2>0) and lactation (PC1<0&PC2<0) based on the number of genes that have peak expression at a particular developmental time (showed in B). B. The time of peak expression for each developmentally associated gene was plotted on a histogram and classified according to the developmental phase (growth, yellow; lactation, blue; involution, purple). The column represents the number of genes that have peak expression at a particular developmental time. C. The frequency of a literature-based cancer modulated genes in the gene subsets associated with the three different stages of mammary gland development. The “growth” group contained more literature-based cancer modulated genes (20%) than the “lactation” (14.7%) and the involution (17%) groups (p<0.05).
Figure 3.Theexpression profiles of mammary gland developmental genes in breast tumors reflect the characteristics of the tumors.
Tumors samples include 5 normal tissues and 11 breast cancer samples (A), 25 mouse mammary gland tumors from six human oncogenic transgenetic mice (B), as well as two groups of grade 1 and grade 3 human breast tumors (C). Tumors were first classified based on the expression profiling of mammary gland developmentally associated genes by unsupervised classification with hierarchical clustering algorithm. Then, accuracy of classification was assessed with Fisher exact test. The non-developmental gene subset was included as a control.
Figure 4Defining the 835 prognosis classifier from the developmentally associated genes based on their expression in tumors.
For each developmentally associated gene, we first counted the number of the breast cancer datasets in which it was “altered” in expression. Based on this database number, all developmental genes were then grouped into six subsets (Sub0, Sub1, Sub2, Sub3, Sub4, and Sub5). The percentage of a literature-based cancer modulated genes in each subset is shown in table (A) and histogram (B). The results of non-developmental genes with same assay method are shown as a control. The details are described in the text.
The enrichment of ontology in 835 intrinsic genes.
| Process and pathway | Genes |
| Cell proliferation/cell cycle (149 ) | VEGFC, Vegfa, TTK, Trp53#, TOP2A, TNFRSF1B, THRA, TGFBI, TGFB3, TGFB2, TGFB1, TACC3, STAT1, SSR1, SPP1, Snk, SET, SESN1, S100A11, RRM2, RRM1, RPA3, RNF2, RFC4, RELB, RECK, RBBP6, RB1, RAD51, QSCN6, PTTG1, PTN, PTHLH, PRC1, PMP22, PLAGL1, PKD2, PGF, PDGFRA, PDGFB, PCNA, PCM1, PA2G4, ORC6L, NUSAP1, Nrp, NRAS, NME1, NFIB, NFIA, NEK2, NDN, MYC, MYB, MST1R, MKI67, MCM7, MCM5, MCM3, MAD2L1, LRP1, LIG1, KPNA2, KIT, KIF2C, KIF11, JUNB, ISG20, IL15, IGFBP6, IGF2, IGF1, IFRD2, IFNAR2, IFI16, Idb2, HK2, GTF2H1, GPS1, GPC4, Gnrh, GAS6, GAS1, FYN, FOS, FLT3, FLT1, FIGF, FGR, FEN1, F2R, ETS2, ERBB3, EPS15, ENPEP, EMP2, ELF5, EGFR, ECT2, DUSP6, DOCK2, DDIT3, DAB2, CYR61, CXCL1, CSF1R, CRIP2, CRIP1, CORO1A, COL18A1, CKS2, Cks1, CHEK1, CHAF1B, CDKN2C, CDKN1C, CDC6, CDC34, Cdc2a, CDC25A, CDC20, CD68, CCNG2, CCND2, CCND1, CCNB2, Ccnb1-rs1, CCNA2, Calml4, CALM3, BUB1, BTG3, BTG2, BTG1, BIRC5, BIN1, BAT3, AREG, APRIN, ANXA1, AK1, AIF1, AHR |
| Cell adhesion (82) | ADAM12, AEBP1, ALCAM, AOC3, APP, ARHC, BYSL, CCL2, CD34, CD36, CD44, CD47, CD9, CDH13, CDH2,CDH3, CDH5,ACAM1, Ceacam2,CELSR2,CNTN1,COL14A1,COL15A1,COL18A1,COL1A1, COL1A2,COL3A1,COL4A1,COL4A2 ,COL5A1,COL5A2,COL6A1,COL6A2,COL6A3,COL7A1,COL8A1,CSPG2,CTGF,CYR61,DDR2,DPT,DSC2,ICAM1,CAM2,ISLR,ITGA6,ITGA7,ITGB2, LAMA2,LAMA4,Lamb1-1,LAMC1,LAMC2,LGALS3,LGALS8,LGALS9,MPDZ,MRC1, NID2,Nrp,PCDH7,PKD2,PRLR,PTK7,PTPNS1,PTPRF,SGCE,SPP1,SRPX,STAT5A,STAT5B,TEK,TGFBI,THBS1,THBS2,TPBG,TSTA3,VCAM1,VWF |
| Angiogenesis (13) | COL18A1, TEK, Vegfa, THBS1, Agpt2, FIGF, CYR61, PGF, CTGF, SERPINE1, VEGFC, Nrp, FLT1 |
| Blood vessel development | COL18A1,TEK,Vegfa,THBS1,Qk, Agpt2,FIGF,CYR61,PGF,CTGF,SERPINE1,VEGFC,Nrp,PPAP2B,FLT1 |
| Cell cycle pathway | MCM3, MCM5, PTTG1, PCNA, TGFB3,CDC6, Trp53,CDKN2C,RB1, TGFB2,MCM7, CCNB2,CDKN1C,ORC6L,CDC20,CDC25A, CCND2,CCNA2,MAD2L1 |
| Cell growth anddeath pathway | MCM3,MCM5,PTTG1,PCNA,CASP1,TGFB3,CDC6,Trp53,CDKN2C,BAD,RB1,TGFB2,MCM7,CCNB2,TNFRSF1B,CDKN1C, ORC6L,CDC20,CDC25A,CCND2,PTPN13,CCNA2,MAD2L1 |
| VEGF receptor activity | PDGFRB, Nrp, PDGFRA, FLT3, FLT1, KIT |
EASE score<0.05.
genes with red word are cancer mutant gene identified in reference (Nat Rev Cancer,4(3):177).
Figure 5The 835 prognosis classifier acts as a powerful predictor of clinical outcome in 78 breast cancer patients.
A. 78 breast cancer samples were first classified by the expression profiles of the 835 prognosis classifier, using unsupervised classification. The survival curve of the two groups was then compared with Kaplan Meier analysis to define clinical outcome (lower panel). Accuracy of classification was assessed with Fisher exact test (upper table). B. The distribution of tumors risk factors in the four groups classified by clinical metastasis and by the 835 prognosis classifier. C. The prognostic value of the 835 prognosis classifier and tumor risk factors.
Figure 6The 835 prognosis classifier could predict clinical outcome in a large set of breast cancer patients.
The intrinsic dataset was applied to 144 node positive and 151 node negative primary breast tumors. The accuracy of prediction (A) or the prognosis value (B) of 835 prognosis classifier and tumor risk factors was assessed by the same approach as described in the legend of Fig. 5B.