Literature DB >> 20015385

Increased epithelial stem cell traits in advanced endometrial endometrioid carcinoma.

Shing-Jyh Chang1, Tao-Yeuan Wang, Chan-Yen Tsai, Tzu-Fang Hu, Margaret Dah-Tsyr Chang, Hsei-Wei Wang.   

Abstract

BACKGROUND: It has been recognized cancer cells acquire characters reminiscent of those of normal stem cells, and the degree of stem cell gene expression correlates with patient prognosis. Lgr5(+) or CD133(+) epithelial stem cells (EpiSCs) have recently been identified and these cells are susceptible to neoplastic transformation. It is unclear, however, whether genes enriched in EpiSCs also contribute in tumor malignancy. Endometrial endometrioid carcinoma (EEC) is a dominant type of the endometrial cancers and is still among the most common female cancers. Clinically endometrial carcinoma is classified into 4 FIGO stages by the degree of tumor invasion and metastasis, and the survival rate is low in patients with higher stages of tumors. Identifying genes shared between advanced tumors and stem cells will not only unmask the mechanisms of tumor malignancy but also provide novel therapeutic targets.
RESULTS: To identify EpiSC genes in late (stages III-IV) EECs, a molecular signature distinguishing early (stages I-II) and late EECs was first identified to delineate late EECs at the genomics level. ERBB2 and CCR1 were genes activated in late EECs, while APBA2 (MINT2) and CDK inhibitor p16 tumor suppressors in early EECs. MAPK pathway was significantly up in late EECs, indicating drugs targeting this canonical pathway might be useful for treating advanced EECs. A six-gene mini-signature was further identified to differentiate early from advanced EECs in both the training and testing datasets. Advanced, invasive EECs possessed a clear EpiSC gene expression pattern, explaining partly why these tumors are more malignant.
CONCLUSIONS: Our work provides new insights into the pathogenesis of EECs and reveals a previously unknown link between adult stem cells and the histopathological traits of EECs. Shared EpiSC genes in late EECs may contribute to the stem cell-like phenotypes shown by advanced tumors and hold the potential of being candidate therapeutic targets and novel prognosis biomarkers.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 20015385      PMCID: PMC2810306          DOI: 10.1186/1471-2164-10-613

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Tumor development, progression, and prognosis remain at the front position of medical research. Two hypotheses of the origin of cancer have existed for many decades. One hypothesis postulates that adult stem or precursor cell is the cell of origin for cancer, whereas the other declares a somatic cell can be mutated and then be dedifferentiated or be reprogrammed to regain properties associated with both cancer cells and stem cells [1-3]. The discovery of a subpopulation of tumor stem cells (TSCs) in leukemia and solid cancers has strengthened the stem cell hypothesis [4]. Glioblastomas also possess characters and gene expression patterns of local neural stem cells (NSCs) [5], and artificially introducing cancer-associated mutations into stem or lineage-restricted precursor cells can indeed turn them into cancer initiating cells and all mice received mutations developed medulloblastomas [6,7]. Another example that the adult stem cell represents the cell of origin of cancer has recently been made in chronic myeloid leukemia (CML): by restricting BCR-ABLp210 expression to mouse Sca1(+) hematopoietic stem cells, it is sufficient to induce CML formation that recapitulates the human disease [8]. These evidences support the idea that mutations of stem cells may initiate the carcinogenic process of certain, although not necessary all, tumors. On the other hand, the importance of somatic or tumor cell mutation and dedifferentiation has not been excluded completely. It has been recognized that during malignant transformation, cancer cells acquire genetic mutations that override the normal mechanisms controlling cellular proliferation. Human tumor cells can be created from healthy somatic cells with defined genetic elements [9]. Even though cancers were originated from mutated stem cells, newly acquired mutations in tumors still contribute in cell malignancy and therapy resistance. It has been recognized that cancer cells acquire characters reminiscent of those of normal stem cells. Clinically cancer cells with poor differentiated pathological grading usually have worse therapy response than those with well differentiated morphology. The degree of embryonic gene re-expression correlates with pivotal tumor features and patient prognosis [10,11]. It is known that colon cancers adopt a broad program encompassing embryonic colon development [12]. In poorly differentiated breast cancer, gliomas and bladder carcinoma, an embryonic stem cell (ESC)-like gene expression signature is exhibited and the degree of ESC program recapitulation correlates with tumor stages and patient survival [13]. Recent studies demonstrated that Snail, a potent oncogene which can induce epithelial-mesenchymal transition (EMT), contributes to the acquisition of stem cell traits in breast cancer cells [14,15]. Pre-existing cancerous lesions may become more malignant by the accumulation of new oncogenic mutations (such as Snail) that can induce cell dedifferentiation. Identifying genes shared between transformed cells, especially the more malignant ones, and stem cells will help to unmask the pathogenesis of tumors, as well as provide us with novel therapeutic targets and prognosis biomarkers. Endometrial carcinoma of the female genital tract can be divided into two forms: endometrial endometrioid carcinoma (EEC; Type I) which account for 70-80% of cases and are estrogen-related; whereas the Type II tumors (papillary serous or clear cell tumors) account for 20% of cases unrelated to estrogen stimulation [16]. Clinically endometrial carcinoma is classified into 4 FIGO stages by the degree of invasion and metastasis: stage I tumors are limit to the uterine body and stage II tumors extend to the uterine cervix. Both stages are considered as less invasive, although stage IIB cases are characterized by a less favorable prognosis. In contrast, tumors of stages III-IV are invasive: for stage III there is regional tumor spread and for stage IV there is bulky pelvic disease or distant spread [17]. Approximately 72% of endometrial carcinomas are stage I, 12% are stage II, 13% are stage III, and 3% are stage IV [17]. The survival rate is also low in patients with higher stages of tumors: 80-90% in stage I, 70-80% in stage II, 40-60% in stage III, and 20% in stage IV [17]. Identifying genes abundant in late EECs can not only unmask the mechanisms of tumor malignancy but also provide us with novel therapeutic targets. Recently Lgr5- or CD133-positive crypt stem cells of the intestinal track were identified and these cells were proven to be one of the original cells of intestinal cancer [18,19]. OLFM4 is also a new, robust marker for stem cells in human intestine and marks a subset of colorectal cancer cells [20]. Disruption of beta-catenin in cells positive for CD133 resulted in a gross disruption of crypt architecture and a disproportionate expansion of CD133(+) cells at the crypt base [19]. It is unclear, however, whether genes high expressed in epithelial stem cells (EpiSCs) also contribute in tumor invasiveness, malignancy and therapy resistance. A broad description of stem cell traits reminiscent in EECs is therefore crucial. In this study we dealt with the molecular bases of endometrial cancer and assessed the expression of epithelial precursor genes in advanced EEC. To examine the shared genes between EpiSC and late EECs, we first need to unmask the gene compositions in different stages of EECs. For this purpose we applied gene expression microarray and machine learning algorithms to filtrate genes differentially expressed in early (stages I-II) and late (stages III-IV) EECs. After obtaining genes unique in EECs of different stages, we then related transcriptional programs in EpiSCs and late EECs. This approach helped to discover a total of 217 probe sets differentiating EECs of different stages, and, moreover, showed late EECs possess a clear EpiSC gene expression pattern, partly explaining why these tumors are more malignant and fatal.

Results

Molecular signatures of early and late stage EECs

To identify epithelial stem cell genes in late EECs, we first delineated early (FIGO stages I and II) and late (FIGO stages III and IV) EECs at the genomics level. We explored genes differentially expressed between early and late EEC tissues using the Affymetrix U133 Plus 2.0 array. The demographics of patients in the training and testing cohorts are in Tables 1 and 2, respectively. Tumor samples were compared to each other to minimize stromal and myometrial contamination as well as female-specific genes. A multidimensional scaling (MDS) plot using the whole transcriptome showed that the mRNA profiles of normal and cancerous tissues are different (Figure 1A). We then searched for genes distinguishing early and late EECs according to a statistical pipeline we used [21,22]. A total of 678 probe sets could differentiate early and late stage samples, as well as discriminate 23 normal endometrium and 33 tumor tissues (Figure 1B; the positive false discovery rate (pFDR) cutoff q values are shown).
Table 1

Characteristics of 34 EEC patients used in the training cohort.

GSE No.TNMFIGO stageHistologyFIGO gradePatient AgeEthnic Background
(Isolation site: Endometrium)
GSM117600T1aN0M01AAdenocarcinoma160-70Asian
GSM152644T1bN0M01BEndometrioid260-70Caucasian
GSM152660T1bN0M01BEndometrioid240-50Caucasian
GSM137960T1bN0M01BEndometrioid260-70Caucasian
GSM137968T1bN0M01BEndometrioid260-70Caucasian
GSM137980T1bN0M01BEndometrioid340-50Caucasian
GSM117586T1bN0M01BEndometrioid250-60African-American
GSM117643T1bN0M01BEndometrioid170-80Caucasian
GSM117667T1bN0M01BEndometrioid260-70Caucasian
GSM117703T1bN0M01BEndometrioid250-60Caucasian
GSM117704T1bN0M01BEndometrioid250-60Caucasian
GSM117722T1bN0M01BEndometrioid270-80Caucasian
GSM117724T1bN0M01BEndometrioid260-70Caucasian
GSM117739T1bN0M01BEndometrioid360-70Caucasian
GSM89034T1bN0Endometrioid240-50Caucasian
GSM89089T1bN0M01BEndometrioid170-80Caucasian
GSM76499T1bN0M01BEndometrioid270-80Caucasian
GSM76638T1bN0M01BEndometrioid260-70Caucasian
GSM117697T1cN0M01CEndometrioid360-70Caucasian
GSM89076T1cN0M01CEndometrioid370-80African Indian
GSM76507T1cN0M01CEndometrioid260-70Caucasian
GSM137955T2aN0M02AEndometrioid260-70African-American
GSM102425T2bN0M02BEndometrioid250-60Caucasian
GSM102444T2bN0M02BEndometrioid160-70Caucasian
GSM46912T2bN0M02BEndometrioid160-70Caucasian
GSM117708T3aN0M03AEndometrioid370-80Caucasian
GSM117712T3aN0M03AEndometrioid260-70Caucasian
GSM38067T3aN0M03AEndometrioid360-70Caucasian
GSM38084T4NXM04AEndometrioid360-70Caucasian
GSM89087T3aNXM1 (*)4BEndometrioid380-90Caucasian
GSM46867T3aN1M1 (**)4BEndometrioid360-70Caucasian
(Isolation site: outside endometrium)
GSM89079T3aNXM1($)4BEndometrioid340-50Caucasian
GSM203686T3aN0M0 ($)3AEndometrioid260-70Caucasian
GSM46932TXNXM1(@)4BEndometrioid250-60Caucasian

Endometrioid: Endometrioid carcinoma

*: Hepatic metastasis

**: Lymph node metastasis

$: Isolated from ovary

@: Isolated from abdominal wall fascia

Table 2

Characteristics of another 15 early EEC patients used in the testing set.

GSE No.TNMFIGO stageHistologyFIGO gradePatient AgeEthnic Background
GSM88952T1aN0M01AEndometrioid250-60Caucasian
GSM76487T1bN0M01BEndometrioid130-40Caucasian
GSM102469T1bN0M01BEndometrioid160-70Caucasian
GSM117579T1bN0MX1BEndometrioid180-90African-American
GSM117589T1bN0MX1BEndometrioid180-90Caucasian
GSM117767T1bN0MX1BEndometrioid260-70Caucasian
GSM137961T1bN0MX1BEndometrioid260-70Caucasian
GSM117590T1cN0MX1CEndometrioid270-80Caucasian
GSM117729T2aN0M02AEndometrioid150-60Caucasian
GSM53176T2aN0MX2AEndometrioid160-70Caucasian
GSM117582T2aN0MX2AEndometrioid240-50Caucasian
GSM88966T2aN1M02AEndometrioid280-90Caucasian
GSM53174T2bN0MX2BAdenosarcoma260-70Caucasian
GSM76525T1aN0M01AEndometrioid (Mix)380-90Caucasian
GSM76632T1bN0M01BAdenocarcinoma250-60Caucasian

Mix: Mixed endometrioid and serous adenocarcinoma

Figure 1

Identification of genes in different EECs. (A) A multidimensional scaling (MDS) plot drawn by all probe sets (~54600 ones) on the chip. Normal endometrium (Normal) and EECs of all 4 stages are included. Each spot represents an array. (B) A Venn diagram summarizing genes differentially expressed between normal and tumor tissues or between early (Stages 1 & 2) and late (Stages 3 & 4) EEC samples in the training cohort. (C) Narrowing down the existing gene signature using a machine learning strategy. When probe sets were ranked by signal-to-noise ratios (weights), the top 217 features was the largest panel to give the lowest error rate (i.e., a best classification effect; upper panel). (D) The discrimination ability of the 217-probeset signature. A prediction strength plot [25] shows the prediction strengths of the identified 217 probe sets in discriminating early from late EECs in the training cohort. Samples 1B and 2B denote 2 early EECs (Stages 1B and 2B, respectively) which express late EEC gene signatures. (E) A MDS plots using the above 217 probe sets. 2 misgrouped early EECs are indicated. (F) Signature evaluation by an independent testing data set. One Stage 1B case, which expresses late EEC gene signatures, is grouped into the late EEC area (separated by a red line).

Characteristics of 34 EEC patients used in the training cohort. Endometrioid: Endometrioid carcinoma *: Hepatic metastasis **: Lymph node metastasis $: Isolated from ovary @: Isolated from abdominal wall fascia Characteristics of another 15 early EEC patients used in the testing set. Mix: Mixed endometrioid and serous adenocarcinoma Identification of genes in different EECs. (A) A multidimensional scaling (MDS) plot drawn by all probe sets (~54600 ones) on the chip. Normal endometrium (Normal) and EECs of all 4 stages are included. Each spot represents an array. (B) A Venn diagram summarizing genes differentially expressed between normal and tumor tissues or between early (Stages 1 & 2) and late (Stages 3 & 4) EEC samples in the training cohort. (C) Narrowing down the existing gene signature using a machine learning strategy. When probe sets were ranked by signal-to-noise ratios (weights), the top 217 features was the largest panel to give the lowest error rate (i.e., a best classification effect; upper panel). (D) The discrimination ability of the 217-probeset signature. A prediction strength plot [25] shows the prediction strengths of the identified 217 probe sets in discriminating early from late EECs in the training cohort. Samples 1B and 2B denote 2 early EECs (Stages 1B and 2B, respectively) which express late EEC gene signatures. (E) A MDS plots using the above 217 probe sets. 2 misgrouped early EECs are indicated. (F) Signature evaluation by an independent testing data set. One Stage 1B case, which expresses late EEC gene signatures, is grouped into the late EEC area (separated by a red line). The discrimination ability of these 678 probe sets were evaluated by a supervised machine learning strategy, which combines the weighted voting algorithm and leave-one-out cross validation (LOOCV) [23-25]. An error rate of 12.1% (2 out of 24 early cancers and 2 out of 9 late samples; P < 0.001 by permutation test) was found (Figure 1C and Additional file 1). However, we found the top 217 features (ranked by the weighted value of each probe set [25]) is the largest panel to have better discrimination ability than that of the 678-probeset signature (error rate 6.1% vs. 12.1%; Figure 1C, upper panel): 2 out of 24 early EEC tissues are classified into the late group while all 9 late ones are correct (Figure 1D). MDS analysis supports the superior classification power of these 217 probe sets: only 2 early samples express late EECs gene signatures and are grouped together with the late cases (Figure 1E). When applying these 217 probe sets on another independent testing data set containing 15 early EEC cases, 1 out of 15 early tissues (error rate 6.7%; P < 0.001 by permutation test) is misgrouped (Figure 1F).

In-depth exploration of EEC-related genes

To have a better idea how the filtrated genes distribute in early and late EECs, a gene expression heat map for those 217 probe sets was drawn (Figure 2). This heat map showed the unique gene expression patterns between early or late EEC tumor tissues. Consistent with the classification data obtained by prediction strength (PS) analysis in Figure 1D, hierarchical clustering showed that only 2 early cases in the training data set are misclassified (indicated by arrows; Figure 2).
Figure 2

Molecular fingerprint of EEC subtypes. A heat map shows the 217 probes sets differentiating early and late EECs in the training data set, as well as discriminating normal endometrium and tumor tissues. Columns represent tumor samples; rows represent probe sets. In red, increased; in blue, decreased. Arrows indicates two early EECs which express a late EEC gene signature (black, Stage 1B; red, Stage 2B).

Molecular fingerprint of EEC subtypes. A heat map shows the 217 probes sets differentiating early and late EECs in the training data set, as well as discriminating normal endometrium and tumor tissues. Columns represent tumor samples; rows represent probe sets. In red, increased; in blue, decreased. Arrows indicates two early EECs which express a late EEC gene signature (black, Stage 1B; red, Stage 2B). Those 217 probe sets correspond to 177 known genes (with gene symbols) and 29 cDNAs, which have no gene symbols been assigned yet (all in Additional file 2). Among them 58 genes/cDNAs are predominantly up in early ECCs while 25 being down (Figure 2). In contrast, 48 genes/cDNAs are particularly high in late EECs while another 75 being low (Figure 2). The details of known genes (especially those with known function) are in Tables 3, 4, 5, 6 and 7 respectively. Many of these genes, such as CD163 [26], MSR1 (CD204) [27], ERBB2 oncogene (also known as HER-2/neu) [28,29], CSTA (stefin A) [30] and CCR1 [31], have been associated with tumor malignancy and poor patient outcomes in EEC or other cancers (Table 3, bold). CD163 and MSR1 (macrophage scavenger receptor 1; CD204) are markers for M2 macrophages, whose infiltration in tumor lesions is correlated with the histological grade of the gliomas [27] (Table 3, bold). These consistent findings support the reliability of our gene lists. We also validated our array data by performing immunohistochemical staining on Taiwanese EEC cases. ERBB2 was indeed more abundant in stages III and IV EEC tissues (Figure 3).
Table 3

Up-regulated known genes in late stage EECs.

Probe Set IDUniGene IDGene TitleGene SymbolChromosomal Location
213532_atHs.404914ADAM metallopeptidase domain 17ADAM17chr2p25
223660_atHs.281342adenosine A3 receptorADORA3chr1p13.2
200966_x_atHs.513490aldolase A, fructose-bisphosphateALDOAchr16q22-q24
205568_atHs.104624aquaporin 9AQP9chr15q22.1-22.2
224376_s_atHs.584985chromosome 20 open reading frame 24C20orf24chr20q11.23
224972_atHs.472564chromosome 20 open reading frame 52C20orf52chr20q11.22
200625_s_atHs.370581CAP, adenylate cyclase-associated protein 1CAP1chr1p34.2
201850_atHs.516155capping protein (actin filament), gelsolin-likeCAPGchr2p11.2
205098_atHs.301921chemokine (C-C motif) receptor 1CCR1chr3p21
203645_s_atHs.504641CD163 moleculeCD163chr12p13.3
209396_s_atHs.382202chitinase 3-like 1 (cartilage glycoprotein-39)CHI3L1chr1q32.1
204971_atHs.518198cystatin A (stefin A)CSTAchr3q21
202190_atHs.172865cleavage stimulation factor, subunit 1, 50kDaCSTF1chr20q13.31
1554863_s_atHs.473133docking protein 5DOK5chr20q13.2
224336_s_atHs.536535dual specificity phosphatase 16DUSP16chr12p13
218282_atHs.632276ER degradation enhancer mannosidase a-like 2EDEM2chr20q11.22
216836_s_atHs.446352v-erb-b2 erythroblastic leukemia viral oncogeneERBB2 (HER2)chr17q11.2-q12
203561_atHs.78864IgG Fc fragment, IIa, receptor (CD32)FCGR2Achr1q23
210889_s_atHs.352642IgG Fc fragment, IIb, receptor (CD32)FCGR2Bchr1q23
210992_x_atHs.78864IgG Fc fragment, IIc, receptor (CD32)FCGR2Cchr1q23.3
204007_atHs.176663IgG Fc fragment, IIIb, receptor (CD16b)FCGR3Bchr1q23
217782_s_atHs.268530G protein pathway suppressor 1GPS1chr17q25.3
212355_atHs.558466KIAA0323KIAA0323chr14q12
203364_s_atHs.410092KIAA0652KIAA0652chr11p11.2
230252_atHs.155538lysophosphatidic acid receptor 5LPAR5chr12p13.31
228360_atHs.357567hypothetical protein LOC130576LOC130576chr2q23.2
226710_atHs.105685similar to RIKEN cDNA C030006K11 geneMGC70857chr8q24
224324_atHs.131072maestroMROchr18q21
226241_s_atHs.355935mitochondrial ribosomal protein L52MRPL52chr14q11.2
214770_atHs.632045macrophage scavenger receptor 1MSR1 (CD204)chr8p22
205460_atHs.156832neuronal PAS domain protein 2NPAS2chr2q11.2
209222_s_atHs.473254oxysterol binding protein-like 2OSBPL2chr20q13.3
210907_s_atHs.478150programmed cell death 10PDCD10chr3q26.1
238693_atHs.529592Polyhomeotic like 3 (Drosophila)PHC3chr3q26.2
203691_atHs.112341peptidase inhibitor 3, skin-derived (SKALP)PI3chr20q12-q13
226577_atHs.593811Presenilin 1 (Alzheimer diseas 3)PSEN1chr14q24.3
217811_atHs.369052selenoprotein TSELTchr3q25.1
222523_atHs.401388SUMO1/sentrin/SMT3 specific peptidase 2SENP2chr3q27.2
227518_atHs.585896solute carrier family 35, member E1SLC35E1chr19p13.11
1552671_a_atHs.496057solute carrier family 9 (Na/H exchanger), 7SLC9A7chrXp11.3-11.23
222410_s_atHs.583855sorting nexin 6SNX6chr14q13.2
203114_atHs.25723Sjogren's syndrome/scleroderma autoantigen 1SSSCA1chr11q13.1
223478_atHs.530373translocase of inner mitochondrial 8 homolog BTIMM8Bchr11q23.1-q23.2
212769_atHs.287362transducin-like enhancer of split 3TLE3chr15q22
204787_atHs.8904V-set and immunoglobulin domain containing 4VSIG4chrXq12-q13.3
221247_s_atHs.900069Williams-Beuren syndrome region 16WBSCR16chr7q11.23
202939_atHs.591501zinc metallopeptidase (STE24 homolog, yeast)ZMPSTE24chr1p34
219050_s_atHs.121025zinc finger, HIT type 2ZNHIT2chr11q13
Table 4

Down-regulated known genes in late stage EECs.

Probe Set IDUniGene IDGene TitleGene SymbolChromosomal Location
211224_s_atHs.158316ATP-binding cassette, sub-family B, 11ABCB11chr2q24
232948_atHs.444414AF4/FMR2 family, member 3AFF3chr2q11.2-q12
207133_x_atHs.99691alpha-kinase 1ALPK1chr4q25
1562271_x_atHs.508738Rho guanine nucleotide exchange factor 7ARHGEF7chr13q34
243899_atHs.579108ADP-ribosylation factor-like 17 pseudogene 1ARL17P1chr17q21.32
211076_x_atHs.143766Atrophin 1ATN1chr12p13.31
214256_atHs.128041ATPase, Class V, type 10AATP10Achr15q11.2
237716_atHs.434253Chromosome 9 open reading frame 3C9orf3chr9q22.32
233844_atHs.522805CD99 molecule-like 2CD99L2chrXq28
243640_x_atHs.127411CDC14 cell division cycle 14 homolog ACDC14Achr1p21
233630_atHs.472027CDP-diacylglycerol synthase 2CDS2chr20p13
210701_atHs.461361craniofacial development protein 1CFDP1chr16q22.2-q22.3
238863_x_atHs.130849Component of oligomeric golgi complex 8COG8chr16q22.1
215377_atHs.501345C-terminal binding protein 2CTBP2chr10q26.13
1561616_a_atHs.591570dynein, axonemal, heavy polypeptide 6DNAH6chr2p11.2
1560042_atHs.591566family with sequence similarity 82, AFAM82Achr2p22.2
243588_atHs.403917FERM, RhoGEF & pleckstrin domain protein 1FARP1chr13q32.2
243876_atHs.189409Formin binding protein 1FNBP1chr9q34
1560094_atHs.155090Guanine nucleotide binding protein, β 5GNB5chr15q21.2
210855_atHs.467733GREB1 proteinGREB1chr2p25.1
1557289_s_atHs.334930GTF2I repeat domain containing 2GTF2IRD2chr7q11.23
232889_atHs.620129glucuronidase, beta pseudogene 1GUSBP1chr5q13.2
1555685_atHs.463511Hexose-6-phosphate dehydrogenaseH6PDchr1p36
240482_atHs.519632Histone deacetylase 3HDAC3chr5q31
1559600_atHs.632767Hypermethylated in cancer 2HIC2chr22q11.21
1557329_atHs.371350Holocarboxylase synthetaseHLCSchr21q22.1
1553111_a_atHs.534040kelch repeat and BTB domain containing 6KBTBD6chr13q14.11
231875_atHs.374201kinesin family member 21AKIF21Achr12q12
232814_x_atHs.20107Kinesin 2KNS2chr14q32.3
242112_atHs.631954LSM11, U7 small nuclear RNA associatedLSM11chr5q33.3
232418_atHs.30824leucine zipper transcription factor-like 1LZTFL1chr3p21.3
1560033_atHs.167531Methylcrotonoyl-Coenzyme A carboxylase 2MCCC2chr5q12-q13
216783_atHs.187866NeuroplastinNPTNchr15q22
217802_s_atHs.632458nuclear casein kinase and CDK substrate 1NUCKS1chr1q32.1
232644_x_atHs.518750OCIA domain containing 1OCIAD1chr4p11
233270_x_atHs.491148Pericentriolar material 1PCM1chr8p22-p21.3
1558695_atHs.188614Pleckstrin homology domain containing, A5PLEKHA5chr12p12
233458_atHs.460298polymerase (RNA) III polypeptide EPOLR3Echr16p12.1
1566541_atHs.580351Protein kinase C, epsilonPRKCEchr2p21
235004_atHs.519904RNA binding motif protein 24RBM24chr6p22.3
212044_s_atHs.523463Ribosomal protein L27aRPL27Achr11p15
215599_atHs.535014SMA4SMA4chr5q13
1556784_atHs.551967Smith-Magenis syndrome region, candidate 7SMCR7chr17p11.2
217704_x_atHs.628886Suppressor of zeste 12 homolog pseudogeneSUZ12Pchr17q11.2
215279_atHs.499209SupervillinSVILchr10p11.2
207365_x_atHs.435667thyroid hormone receptor, betaTHRBchr3p24.2
215428_atHs.510833Tight junction protein 1 (zona occludens 1)TJP1chr15q13
225004_atHs.514211transmembrane protein 101TMEM101chr17q21.31
242347_atHs.8752Transmembrane protein 4TMEM4chr12q15
238079_atHs.576468tropomyosin 3TPM3chr1q21.2
237513_atHs.98609trypsin X3TRY1chr7q34
1557571_atHs.439381Vacuolar protein sorting 13 homolog DVPS13Dchr1p36.22
235551_atHs.248815WD repeat domain 4WDR4chr21q22.3
1555259_atHs.444451sterile alpha motif and leucine zipper kinase AZKZAKchr2q24.2
Table 5

Up-regulated biological modules in late EECs.

Biological Process%P-ValueGenes
Regulation of catalytic activity12.50%0.0053DUSP16, CAP1, ADORA3, ERBB2, GPS1, PSEN1
Immune system process16.67%0.01694AQP9, FCGR2A, FCGR2B, FCGR2C, FCGR3B, CCR1, ERBB2, VSIG4
Second-messenger-mediated signalling8.33%0.02006CAP1, ADORA3, ERBB2, CCR1
Regulation of MAP kinase activity6.25%0.02205DUSP16, ERBB2, GPS1
Cell surface receptor linked signal transduction20.83%0.02535TLE3, CAP1, SENP2, ADORA3, ERBB2, LPAR5, CCR1, ADAM17, PSEN1, SNX6
Membrane organization and biogenesis8.33%0.0314CAP1, ZMPSTE24, MSR1, TIMM8B
Table 6

Up-regulated known genes in early stage EECs.

Probe Set IDUniGene IDGene TitleGene SymbolChromosomal Location
225054_x_atHs.293560Archaemetzincins-2AMZ2chr17q24.2
209870_s_atHs.525718amyloid beta (A4) precursor protein-binding A2APBA2chr15q11-q12
1560851_atHs.351856chromosome 10 open reading frame 136C10orf136chr10q11.21
234457_atHs.512758chromosome 6 open reading frame 12C6orf12chr6p21.33
1561271_atHs.328147coiled-coil domain containing 144CCCDC144Cchr17p11.2
211156_atHs.512599cyclin-dependent kinase inhibitor 2A (p16)CDKN2Achr9p21
220335_x_atHs.268700esterase 31CES3chr16q22.1
204373_s_atHs.557659centrosomal protein 350kDaCEP350chr1p36.13-q41
233502_atHs.12723Contactin 3 (plasmacytoma associated)CNTN3chr3p26
244187_atHs.512181Chromosome X open reading frame 33CXorf33chrXq21.1
229738_atHs.577398dynein, axonemal, heavy polypeptide 10DNAH10chr12q24.31
219651_atHs.317659developmental pluripotency associated 4DPPA4chr3q13.13
1555118_atHs.441145ectonucleoside tri-P diphosphohydrolase 3ENTPD3chr3p21.3
206794_atHs.390729v-erb-a erythroblastic leukemia viral oncogeneERBB4chr2q33.3-q34
241252_atHs.99480establishment of cohesion 1 homolog 2ESCO2chr8p21.1
209631_s_atHs.406094G protein-coupled receptor 37GPR37chr7q31
229714_atHs.171001heparan sulfate 6-O-sulfotransferase 3HS6ST3chr13q32.1
213598_atHs.533222Dimethyladenosine transferaseHSA9761chr5q11-q14
231500_s_atHs.444600SLC7A5 pseudogeneLAT1-3TMchr16p11.2
232953_atHs.566209hypothetical LOC400723LOC400723chr11p15.5
239076_atHs.520804Similar to cell division cycle 10 homologLOC441220chr7p13
1558579_atHs.587089hypothetical protein LOC642691LOC642691chr2p11.1
222159_atHs.497626Plexin A2PLXNA2chr1q32.2
226766_atHs.13305roundabout, axon guidance receptor, 2ROBO2chr3p12.3
1569124_atHs.267765similar to Leucine-rich repeat protein SHOC-2RP11-139H14.4chr13q14.12
220232_atHs.379191stearoyl-CoA desaturase 5SCD5chr4q21.22
214257_s_atHs.534212SEC22 vesicle trafficking protein homolog BSEC22Bchr1q21.1
242536_atHs.205816Solute carrier family 17, member 1SLC17A1chr6p23-p21.3
220551_atHs.242821solute carrier family 17, member 6SLC17A6chr11p14.3
1559208_atHs.437696ST7 overlapping transcript 4 (non-coding RNA)ST7OT4chr7q31.1-7q31.2
233251_atHs.21379Spermatid perinuclear RNA binding proteinSTRBPchr9q33.3
223751_x_atHs.120551toll-like receptor 10TLR10chr4p14
217797_atHs.301412ubiquitin-fold modifier conjugating enzyme 1UFC1chr1q23.3
229997_atHs.515130vang-like 1 (van gogh, Drosophila)VANGL1chr1p11-p13.1
204590_x_atHs.592009vacuolar protein sorting 33 homolog AVPS33Achr12q24.31
232964_atHs.488157Williams Beuren syndrome region 19WBSCR19chr7p13
227621_atHs.446091Wilms tumor 1 associated proteinWTAPchr6q25-q27
240296_atHs.98322Zinc finger, A20 domain containing 1ZA20D1chr1q21.2
226208_atHs.593643zinc finger, SWIM-type containing 6ZSWIM6chr5q12.1
Table 7

Down-regulated known genes in early stage EECs.

Probe Set IDUniGene IDGene TitleGene SymbolChromosomal Location
215535_s_atHs.4092301-acylglycerol-3-phosphate O-acyltransferase 1AGPAT1chr6p21.3
202204_s_atHs.295137autocrine motility factor receptorAMFRchr16q21
212536_atHs.478429ATPase, Class VI, type 11BATP11Bchr3q27
220975_s_atHs.201398C1q and tumor necrosis factor related protein 1C1QTNF1chr17q25.3
224794_s_atHs.495230cerebral endothelial cell adhesion molecule 1CEECAM1chr9q34.11
1557394_atHs.249600discs, large homolog-associated protein 4DLGAP4chr20q11.23
211958_atHs.369982insulin-like growth factor binding protein 5IGFBP5chr2q33-q36
225303_atHs.609291kin of IRRE like (Drosophila)KIRRELchr1q21-q25
218717_s_atHs.374191leprecan-like 1LEPREL1chr3q28
209205_s_atHs.436792LIM domain only 4LMO4chr1p22.3
203506_s_atHs.409226mediator of RNA polymerase II transcription 12MED12chrXq13
207564_x_atHs.405410O-linked N-acetylglucosamine transferaseOGTchrXq13
214484_s_atHs.522087opioid receptor, sigma 1OPRS1chr9p13.3
203244_atHs.567327peroxisomal biogenesis factor 5PEX5chr12p13.3
241916_atHs.130759Phospholipid scramblase 1PLSCR1chr3q23
229001_atHs.601513Protein phosphatase 1, regulatory 3EPPP1R3Echr14q11.2
208720_s_atHs.282901RNA binding motif protein 39RBM39chr20q11.22
209148_atHs.388034retinoid × receptor, betaRXRBchr6p21.3
209352_s_atHs.13999SIN3 homolog B, transcription regulator (yeast)SIN3Bchr19p13.11
221500_s_atHs.307913syntaxin 16STX16chr20q13.32
220036_s_atHs.272838syntaxin 6STX6chr1q25.3
201110_s_atHs.164226thrombospondin 1THBS1chr15q15
221507_atHs.631637transportin 2 (importin 3, karyopherin b 2b)TNPO2chr19p13.13
208723_atHs.171501ubiquitin specific peptidase 11USP11chrXp11.23
Figure 3

ERBB2 protein expression in early and late EECs. (A) Representative immunohistochemical (IHC) staining of ERBB2 protein in primary EEC tissues. Staining results were graded as 0+: undetectable staining in <10% of the tumor cells; 2+: weak to moderate complete membrane staining (indicated by an arrow) in <10% of the tumor cells; 3+: strong complete membrane staining observed in <10% of the tumor cells. EEC cases were categorized as ERBB2-negative (scores 0 and 1+) or positive (scores 2+ and 3+). (B) A histogram summarizing the IHC results on 36 primary EEC tissues stained for ERBB2. A chi square test P value is shown. Case numbers and percentages are also indicated.

ERBB2 protein expression in early and late EECs. (A) Representative immunohistochemical (IHC) staining of ERBB2 protein in primary EEC tissues. Staining results were graded as 0+: undetectable staining in <10% of the tumor cells; 2+: weak to moderate complete membrane staining (indicated by an arrow) in <10% of the tumor cells; 3+: strong complete membrane staining observed in <10% of the tumor cells. EEC cases were categorized as ERBB2-negative (scores 0 and 1+) or positive (scores 2+ and 3+). (B) A histogram summarizing the IHC results on 36 primary EEC tissues stained for ERBB2. A chi square test P value is shown. Case numbers and percentages are also indicated. Up-regulated known genes in late stage EECs. Down-regulated known genes in late stage EECs. Up-regulated biological modules in late EECs. Up-regulated known genes in early stage EECs. Down-regulated known genes in early stage EECs. To gain more insights into the functional consequences of differential gene expression, we performed gene set enrichment analysis for the filtrated genes. Signature probe sets were subjected into the Gene Ontology (GO) database search to find statistically over-represented functional groups within these genes. The biological processes being statistically overrepresented (P < 0.05) in late stage-enriched genes are shown in Table 5. These predominant processes include those pertaining to immune system process, second-messenger-mediated signaling (genes also involved in cyclic nucleotide second messenger (P = 0.0306) are bold), MAP kinase activity (genes also involved in the inactivation of MAPK activity (P = 0.0459) are bold), membrane organization and biogenesis, regulation of catalytic activity (genes also involved in the positive regulation of catalytic activity (P = 0.0182) are bold), and cell surface receptor-linked signal transduction are significantly up (Table 5). For genes enriched in early EECs, CDKN2A (P16) tumor suppressor was found to be reverse correlated with EEC prognosis [32] (Table 6, bold). Another tumor suppressor is APBA2 (amyloid beta (A4) precursor protein-binding, family A, member 2; also known as MINT2), which is frequently methylated and silent in colorectal carcinoma and gastric carcinoma [33]. Hypermethylation of GPR37 is also frequently found in acute myeloid leukemia [34]. In terms of oncogenes, ROBO2 (roundabout, axon guidance receptor, 2), a receptor of the SLIT2 axon guidance and cell migration growth factor, is associated with poor prognosis of breast cancer [35]. ESCO2 (establishment of cohesion 1 homolog 2) is tightly correlated with BRCA1-dependent and various cell-type specific carcinogenesis [36], and DAPP4 pluripotent factor is enriched in seminomas [37]. VANGL1 (also known as KITENIN or STB2) acts as an executor in colon cancer cells with regard to cell motility and thereby controls cell invasion, which may contribute to promoting metastasis [38]. The abundant expression of known oncogenes in early EECs also suggests the early EEC cases contain high percentage of epithelial tumor cells instead of merely stromal and myometrial contaminations.

A six-gene signature distinguishing early and late EECs

When evaluating the classification effect of filtrated genes, we noticed that the top 6 genes could already distinguish early and late EECs, and these 6 genes gave the same diagnostic power to that of the 217 probe sets in the training cohort (Figure 4A). The same two early cases (one Stage 1B and one Stage 2B) were misgrouped with the late ones (Figure 4B). When applying these 6 genes on the testing data set, a lowest error rate could also be achieved (Figure 4C, upper panel). Only 1 out of 15 early tissues (error rate 6.7%; P < 0.001 by permutation test) was misgrouped (Figure 4C, lower panel). The same Stage 1B sample was misclassified when either applied only these 6 genes or the entire 217 probe sets (Figure 1F). Thus, these 6 genes hold clinical potentials of being diagnostic biomarkers. These 6 genes are: (1) ATP-binding cassette, B (MDR/TAP), 11 (ABCB11) (2) Archaemetzincins-2 (AMZ2) (3) amyloid beta (A4) precursor protein-binding A2 (APBA2) (4) LIM domain only 4 (LMO4) (5) Hypothetical protein LOC647065 (LOC647065) and (6) Homo sapiens mRNA, clone IMAGE:5759975 (cDNA FLJ12258 fis) (Table 8). AMZ and APBA2 are up-regulated in early EECs. ABCB11, LOC647065 and cDNA FLJ12258 fis are down in tumors, especially in late EECs, while LMO4 particularly down in early EECs.
Figure 4

A six-gene signature dividing early and late EECs. (A) Further narrowing down the existing gene signature to fewer genes. When probe sets were ranked by their signal-to-noise ratios (weights), the top 6 features form the smallest panel which can give the best classification effect. (B) A prediction strength (PS) plot shows the prediction strength of these 6 genes. They give the same classification effect as that of the 217-probeset signature. (C) Signature evaluation by a testing data set. A lowest error rate (upper) and best classification effect (shown by a PS plot; lower panel) was achieved.

Table 8

Gene annotations of the six-gene signature.

Probe Set IDUniGene IDGene TitleGene SymbolChromosomal Location
233113_atHs.633901Homo sapiens, clone IMAGE:5759975, mRNA------
211224_s_atHs.158316ATP-binding cassette, B (MDR/TAP), 11ABCB11chr2q24
225054_x_atHs.293560Archaemetzincins-2AMZ2chr17q24.2
209870_s_atHs.525718amyloid beta (A4) precursor protein-binding A2APBA2chr15q11-q12
209205_s_atHs.436792LIM domain only 4LMO4chr1p22.3
239819_atHs.624027Hypothetical protein LOC647065LOC647065chr2q23.1
A six-gene signature dividing early and late EECs. (A) Further narrowing down the existing gene signature to fewer genes. When probe sets were ranked by their signal-to-noise ratios (weights), the top 6 features form the smallest panel which can give the best classification effect. (B) A prediction strength (PS) plot shows the prediction strength of these 6 genes. They give the same classification effect as that of the 217-probeset signature. (C) Signature evaluation by a testing data set. A lowest error rate (upper) and best classification effect (shown by a PS plot; lower panel) was achieved. Gene annotations of the six-gene signature.

Re-activation of epithelial stem cell genes in advanced EECs

Since our main goal is to identify EpiSC genes in EECs, we compared the gene expression profiles of EEC tissues of all 4 stages to that of normal CD133+ EpiSCs [39]. When the 217 genes distinguishing early and late EECs were applied to compare the relationships between EECs and EpiSCs, clearly EpiSCs have a closest relationship to late EECs (Figure 5A). This impression is strengthened by calculating the average linkage distances between sample groups. Compared with early EECs, EEC of both Stages III and IV are closer to EpiSCs to a similar extent (Figure 5B), suggesting the re-expression of EpiSC features in late EECs. A total of 26 EpiSC genes are overexpressed in advanced EECs (Figure 5C). Also, genes down-regulated in late EECs (the 77 probe sets in Figure 2) are absence in EpiSCs (Figure 5D). Most early EECs clustered together and expressed the intermediate level of EpiSC genes (Figure 5C-D), consistent with the distances analysis result in Figure 5B.
Figure 5

Expression of EpiSC gene patterns in EECs, especially late ones. (A) Relationships between normal endometrium, EECs of different stages in the training data set and epithelial stem cells (EpiSCs). This MDS plot was drawn by the 217 features differentiating early and late EECs. (B) Average linkage distances between tissues and EpiSCs. The same 217 probe sets were used. The confidence limits shown represent the standard error. (C) A heat map shows genes overexpressed in both EpiSCs and late EECs. Gene symbols of these genes are shown. Genes associate with tumor malignancy or stem cell biology are underlined. (D) A heat map shows the distribution patterns of the 77 probe sets down-regulated in late EECs. These genes are also absence in EpiSCs.

Expression of EpiSC gene patterns in EECs, especially late ones. (A) Relationships between normal endometrium, EECs of different stages in the training data set and epithelial stem cells (EpiSCs). This MDS plot was drawn by the 217 features differentiating early and late EECs. (B) Average linkage distances between tissues and EpiSCs. The same 217 probe sets were used. The confidence limits shown represent the standard error. (C) A heat map shows genes overexpressed in both EpiSCs and late EECs. Gene symbols of these genes are shown. Genes associate with tumor malignancy or stem cell biology are underlined. (D) A heat map shows the distribution patterns of the 77 probe sets down-regulated in late EECs. These genes are also absence in EpiSCs.

Discussion

EEC still ranks one of the most fatal female cancers worldwide and disease progression very often accompany with worse clinical outcomes and treatment failure. Identifying genes or canonical pathways associated with advanced cancer can help to unmask the mechanisms of tumor malignancy as well as provide us with novel drug targets. It has been recognized clinically that cancer cells, especially the advanced and metastatic ones, possess characters reminiscent of those of normal stem cells. The degree of stem cell gene expression correlates with pivotal tumor features and patient prognosis [10,11,13]. Hence, identifying shared genes between late EECs and stem cells will provide new insights into cancer biology, as well as new prognosis markers and therapeutic targets. In this study, we identified a 217-probeset signature which could distinguish late (stages III-IV) from early (stages I-II) EECs (Figure 1). More low stage disease array data than high stage ones were obtained, which may partly due to the fact that the early diagnosis takes place in almost 90% of EEC clinically. We combined primary and metastatic late EEC samples in one group since their molecular profiles are indistinguishable (not shown). Prostate EpiSCs were used as a comparative group since array data for endometrial stem cells is not available yet. Nevertheless, prostate CD133+ cells are still epithelial stem cells and therefore good controls. Other EpiSC data should reproduce part of our findings. Our results reveal a previously unaware link between genes associated with EpiSC identity and the histopathological traits of EECs. It is possible that these genes contribute to the stem cell-like phenotypes of late EECs. A total of 26 EpiSC genes were found overexpressed in late EECs (Figure 5C), and genes down-regulated in late EECs (Figure 2; 77 probe sets) are also absence in EpiSCs (Figure 5D). Among those 26 overexpressed genes there are famous oncogenes or stemness genes (Figure 5C, underlined). ADAM17 (A Disintegrin and A Metalloproteinase 17), also known as tumor necrosis factor-alpha converting enzyme (TACE) or less commonly CD156q, is a therapeutic target in multiple diseases since major contemporary pathologies like cancer, inflammatory and vascular diseases seem to be connected to its cleavage abilities [40]. CAP1 (adenylate cyclase-associated protein 1) overexpressed in pancreatic cancers is involved in cancer cell motility [41]. CAPG (capping protein (actin filament), gelsolin-like) also contributes in the motility of pancreatic cancer cells [42]. PDCD10 (CCM3) is involved in cerebral cavernous malformations (CCM) [43] and is found to interact with Ste20-related kinase MST4 to promote cell growth and transformation via modulation of the ERK pathway [44]. PSEN1 (presenilin 1) is involved in apoptosis, overexpressed in high-risk patients with stage I non-small cell lung cancer (NSCLC), and is in a prognosis signature of NSCLC patients [45]. SENP2 (SUMO-specific protease 2) is highly expressed in trophoblast cells that are required for placentation, and targeted disruption of SENP2 in mice reveals its essential role in development of all three trophoblast layers via modulating the Mdm2-p53 pathway [46]. The appearance of these known oncogenes or stemness genes in our data supports the reliability of our gene lists. The roles of EpiSC genes in both epithelial stem cell biology and EEC malignancy will be addressed further. Several genes were previous suggested to be tumor suppressors. CSTA (cystatin A, or stefin A), a cysteine proteinases inhibitor, is implicated in preventing local and metastatic tumor spread of cancers. The risk of disease recurrence and disease-related death was thus higher in patients with low CSTA in patients with squamous cell carcinoma of the head and neck [30]. NPAS2 (neuronal PAS domain protein 2) is a circadian gene as well as a putative tumor suppressor involved in DNA damage response [47]. PHC3 (polyhomeotic homolog 3), a component of the hPRC-H complex, associates with E2F6 during G0 and is lost in osteosarcoma tumors [48]. Validating their expression in different stages of EECs by further immunohistochemstry study will not only provide novel malignancy mechanisms but will also present new drug targets. In the past few years, much effort has been put to explore the mechanisms and additional molecular markers for predicting prognosis of EECs by using high-throughput genomics technology. Gene expression microarray (GEM) is a popular platform among all of those high-throughput genomics techniques. In this study we applied GEM and machine learning algorithms to filtrate out a 217-probeset signature for disease diagnosis. Many of the filtrated genes have been linked to tumor progression and malignancy, supporting the reliability of our array data. Moreover, we narrowed down this 217-probeset profile to a six-gene mini-signature for the differentiation of early to late EECs in the training set. This signature can be validated by an independent testing cohort (Figure 4). Owing to the small gene number of this signature, it is now possible to check their mRNA levels in patient tissues by real-time PCR in regular clinical labs. Recently a five-gene profile and a five-microRNA signature are identified for the prediction of clinical outcomes in non-small-cell lung cancer [49,50]. Whether our six-gene signature can be correlated with relapse-free and overall survival among patients with EEC is unclear and awaited to be elucidated. Also, whether the protein expression levels of these 6 genes correlate with those of mRNAs is unclear. Since most of the patients in either training or testing data set were Caucasian (Table 1), whether this gene signature can be applied in patients with various genetic backgrounds should also be studied. In our datasets we noticed that few early EEC cases expressed already late EEC genes and therefore could not be classified correctly (Figs. 1, 2). Since patients with late and metastatic EEC tend to have poor prognosis, whether these unusual early cases possess worse clinical outcomes is an interesting issue. It has been suggested that prognosis potential of human tumors is inherited in early lesions. For example, the gene expression patterns in metastatic colorectal carcinoma are readily distinguishable from those associated with in situ tumors [24,51]. A subset of primary tumors resembled metastatic tumors with respect to this gene-expression signature [24,51]. Very recently Varmus and colleagues showed that when untransformed mouse mammary cells were introduced into the systemic circulation of a mouse, those cells can bypass transformation at the primary site, form long-term residence in the lungs but do not form ectopic tumors [52]. Husemann et al. also observed that systemic spread can be an early step in breast cancer. Tumor cells can disseminate systemically from earliest epithelial alterations and form and micrometastasis in bone marrow and lungs [53]. Therefore, release from dormancy of early-disseminated cancer cells may frequently account for metachronous metastasis. The metastatic potential of human tumors is encoded in the bulk of a primary tumor and, at least in a subset of patients, metastatic capability in cancers is an inherent feature. Our EEC gene signatures therefore hold the potential of being a novel prognosis panel. More advanced therapy and clinical follow-up should be applied on early stage patients with molecular feature similar to that of EpiSC. In advanced EECs, tumor tissues express more genes abundant in CD133+ EpiSC and acquired a stem cell trait (Figure 5). The expression of these EpiSC genes in late EECs may due to the re-expression of EpiSC features in late stage EECs, i.e., further mutations and stem cell gene reactivation in certain early EECs. The intermediate EpiSC gene expression level in early EECs supports this point (Figure 5A &5C-D). Recent studies demonstrated that EMT contributes to the acquisition of stem cell traits in cancer cells and the induction of EMT inducer Snail results in stemness gene expression [14,15]. Whether EMT also contributes in EEC progression and metastasis is an interesting issue to follow. However, we did not rule out the possibility that certain late EECs may arise from an independent rapidly progressing cancer utilizing stemness molecular pathways. According to the tumor stem cell theory, cancer cells may be originated from different cancer stem cells acquiring distinctive oncogenic mutations. Certain early EECs have the capacity to progress to late stage disease may due to a mechanism that they arose from the same mutated progenitor cells as late EECs. The observation that several early EEC cases express EpiSC genes already (Figure 1D &5C) favors the later hypotheses. These 2 situations may both exist in vivo, but our profiling work cannot favor any of them yet. Nevertheless, genes filtrated here will provide clinicians novel prognosis markers and therapeutic targets.

Conclusions

In summary, here we reveal distinct epithelial stem cell traits and gene expression patterns in late EECs and some of these genes hold the potential of being novel drug targets. Drugs targeting MAP kinase pathway, for example, may be applied for the treatment of late EEC since this canonical pathway is significantly up in late EECs (Table 5). Since applying a statistical analysis of gene ontology terms is the reliance on prior knowledge of the biological activity of each differentially expressed gene, the enrichment of genes associated with specific pathways may be a consequence of intense research in such areas. Hence, new canonical pathways may still exist and may serve as candidate therapeutic targets. Function of the filtrated KIAA (such as KIAA0323, Figure 5C) and LOC series of anonymous ESTs (such as C20orf24, Figure 5C) in Tables 3, 4, 5, 6, 7 should be studied and their roles in tumor malignancy, chemoresistance and EpiSC stemness are awaited to be elucidated. Further studies to prove the prognosis values and therapeutic potentials of the identified genes, especially those also present in epithelial stem cells, should lead to a better understanding of EEC and EpiSC biology and the susceptibilities of late EECs to treatment.

Methods

Microarray data sets

All array data were implemented by the Affymetrix™ HG-U133 Plus 2.0 GeneChip. Array data of normal CD133+ epithelial stem cells, which were used as a normal counterpart of cancer stem cells [39], isolated from benign prostatic hyperplasia were downloaded from the ArrayExpress database at the European Bioinformatics Institute (http://www.ebi.ac.uk/microarray-as/ae/; Accession No. E-MEXP-993; array data files 1325504978.cel, 1325505459.cel and 1325505089.cel were used). The gene expression profiles of EEC tissues of different stages were generated by the International Genomics Consortium (IGC) under the expO (Expression Project for Ontology) project and were downloaded from Gene Expression Omnibus (GEO http://www.ncbi.nlm.nih.gov/geo/; GSE2109). EEC array data were divided into training (n = 33; incl. all 4 stages) and testing cohorts (n = 15) (details in Table 1). Array data of normal endometrium controls were from a Human body index dataset in GEO (GSE7307).

Array data processing

Feature selection was performed as previously described [22]. Briefly, the default robust multichip average (RMA) settings were used to background correct, normalize and summarize all expression values using the 'affy' package of the Bioconductor suite of software http://www.bioconductor.org/ for the R statistical programming language. A t-statistic was calculated as normal for each gene and a p-value then calculated using a modified permutation test in the "LIMMA" package [22]. To control the multiple testing errors, a false discovery rate (FDR) algorithm was then applied to these p-values to calculate a set of q-values: thresholds of the expected proportion of false positives, or false rejections of the null hypothesis [22,54]. Gene annotation was performed by the ArrayFusion web tool http://microarray.ym.edu.tw/tools/arrayfusion/[55]. Gene enrichment analysis was performed by the Gene Ontology (GO) database using the DAVID Bioinformatics Resources 2008 interface http://david.abcc.ncifcrf.gov/, a graph theory evidence-based method to agglomerate gene or protein identifiers [56,57].

Bioinformatics analysis

The discrimination power of filtrated genes was evaluated by a machine-learning approach combining the weighted voting algorithm [24] and leave-one-out cross-validation (LOOCV). This approach has been integrated in our Java tool http://microarray.ym.edu.tw/tools/set/[25]. In brief, the uploaded genes are ranked according to the absolute values of corresponding signal-to-noise scores [24] in a descending order. Genes are included into a signature one at a time based on the order of ranking. The error rate for each new signature is estimated by the weighted voting algorithm and LOOCV and can be monitored by an error rate distribution plot [25]. Based on the error rate information, we then selected an appropriate composition of discriminating genes with the lowest error rate. Once a signature is defined, the result of prediction strength (PS) analysis for each sample was shown. The PS values range from -1 to +1, where higher absolute values reflect stronger predictions [25]. An overview of the results for samples in different groups was then illustrated by a PS plot [25]. Classical multidimensional scaling (MDS) is performed by the standard function of the R program to provide a visual impression of how the various sample groups are related. The average linkage distance between samples is calculated by the Pearson correlation subtracted from unity to provide bounded distances in the range (0, 2), as described in our previous study [22]. The distance between two groups of samples is calculated using the average linkage measure (the mean of all pair-wise distances (linkages) between members of the two groups concerned). The standard error of the average linkage distance between two groups (the standard deviation of pair-wise linkages divided by the square root of the number of linkages) is quoted when inter-group distances are compared in the text.

Immunohistochemical staining

Staining was performed on formalin-fixed, paraffin-embedded specimens using anti-ERBB2 primary antibody (DAKO, Carpinteria, CA, USA). Scoring was performed as following. 0: undetectable staining or membrane staining in <10% of the tumor cells. 1+: faint and incomplete membrane staining in >10% of the tumor cells; 2+: weak to moderate complete membrane staining in >10% of the tumor cells; 3+: strong complete membrane staining observed in >10% of the tumor cells. ERBB2 protein expression was categorized as negative (scores 0 and 1+), or positive (scores 2+ and 3+) [29].

Authors' contributions

SJC, TYW, and HWW designed the study project. SJC and TYW collected microarray data sets and EEC materials. SJC, TYW, CYT, and TFW executed project plan and data analysis. SJC, TYW, MDC, and HWW carried out data interpretation and discussion. SJC wrote the manuscript. Then HWW modified it. All authors read and approved the final manuscript.

Additional file 1

The discrimination ability of the 678 probe sets. Prediction power of the 678 probe sets differentiating early and late stage samples, as well as discriminating normal endometrium and tumor tissues. Click here for file

Additional file 2

The annotation of probed genes and cDNAs. Complete data of analyzed arrays and clustered genes/cDNAs. Click here for file
  57 in total

1.  Functional network reconstruction reveals somatic stemness genetic maps and dedifferentiation-like transcriptome reprogramming induced by GATA2.

Authors:  Tse-Shun Huang; Jui-Yu Hsieh; Yu-Hsuan Wu; Chih-Hung Jen; Yang-Hwei Tsuang; Shih-Hwa Chiou; Jukka Partanen; Heidi Anderson; Taina Jaatinen; Yau-Hua Yu; Hsei-Wei Wang
Journal:  Stem Cells       Date:  2008-02-28       Impact factor: 6.277

2.  Clinical quantitation of immune signature in follicular lymphoma by RT-PCR-based gene expression profiling.

Authors:  Richard J Byers; Ebrahim Sakhinia; Preethi Joseph; Caroline Glennie; Judith A Hoyland; Lia P Menasce; John A Radford; Timothy Illidge
Journal:  Blood       Date:  2008-01-03       Impact factor: 22.113

3.  Prognostic significance of cysteine proteinases cathepsins B and L and their endogenous inhibitors stefins A and B in patients with squamous cell carcinoma of the head and neck.

Authors:  P Strojan; M Budihna; L Smid; B Svetic; I Vrhovec; J Kos; J Skrk
Journal:  Clin Cancer Res       Date:  2000-03       Impact factor: 12.531

4.  An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors.

Authors:  Ittai Ben-Porath; Matthew W Thomson; Vincent J Carey; Ruping Ge; George W Bell; Aviv Regev; Robert A Weinberg
Journal:  Nat Genet       Date:  2008-05       Impact factor: 38.330

5.  The epithelial-mesenchymal transition generates cells with properties of stem cells.

Authors:  Sendurai A Mani; Wenjun Guo; Mai-Jing Liao; Elinor Ng Eaton; Ayyakkannu Ayyanan; Alicia Y Zhou; Mary Brooks; Ferenc Reinhard; Cheng Cheng Zhang; Michail Shipitsin; Lauren L Campbell; Kornelia Polyak; Cathrin Brisken; Jing Yang; Robert A Weinberg
Journal:  Cell       Date:  2008-05-16       Impact factor: 41.582

6.  Breast cancer expression of CD163, a macrophage scavenger receptor, is related to early distant recurrence and reduced patient survival.

Authors:  Ivan Shabo; Olle Stål; Hans Olsson; Siv Doré; Joar Svanvik
Journal:  Int J Cancer       Date:  2008-08-15       Impact factor: 7.396

7.  Acquisition of granule neuron precursor identity is a critical determinant of progenitor cell competence to form Shh-induced medulloblastoma.

Authors:  Ulrich Schüller; Vivi M Heine; Junhao Mao; Alvin T Kho; Allison K Dillon; Young-Goo Han; Emmanuelle Huillard; Tao Sun; Azra H Ligon; Ying Qian; Qiufu Ma; Arturo Alvarez-Buylla; Andrew P McMahon; David H Rowitch; Keith L Ligon
Journal:  Cancer Cell       Date:  2008-08-12       Impact factor: 31.743

8.  Signature Evaluation Tool (SET): a Java-based tool to evaluate and visualize the sample discrimination abilities of gene expression signatures.

Authors:  Chih-Hung Jen; Tsun-Po Yang; Chien-Yi Tung; Shu-Han Su; Chi-Hung Lin; Ming-Ta Hsu; Hsei-Wei Wang
Journal:  BMC Bioinformatics       Date:  2008-01-28       Impact factor: 3.169

9.  Gene expression profiling of human prostate cancer stem cells reveals a pro-inflammatory phenotype and the importance of extracellular matrix interactions.

Authors:  Richard Birnie; Steven D Bryce; Claire Roome; Vincent Dussupt; Alastair Droop; Shona H Lang; Paul A Berry; Catherine F Hyde; John L Lewis; Michael J Stower; Norman J Maitland; Anne T Collins
Journal:  Genome Biol       Date:  2008-05-20       Impact factor: 13.583

10.  Generation of breast cancer stem cells through epithelial-mesenchymal transition.

Authors:  Anne-Pierre Morel; Marjory Lièvre; Clémence Thomas; George Hinkal; Stéphane Ansieau; Alain Puisieux
Journal:  PLoS One       Date:  2008-08-06       Impact factor: 3.240

View more
  8 in total

1.  EGFR isoforms and gene regulation in human endometrial cancer cells.

Authors:  Lina Albitar; Gavin Pickett; Marilee Morgan; Jason A Wilken; Nita J Maihle; Kimberly K Leslie
Journal:  Mol Cancer       Date:  2010-06-25       Impact factor: 27.401

2.  Differentially regulated splice variants and systems biology analysis of Kaposi's sarcoma-associated herpesvirus-infected lymphatic endothelial cells.

Authors:  Ting-Yu Chang; Yu-Hsuan Wu; Cheng-Chung Cheng; Hsei-Wei Wang
Journal:  Nucleic Acids Res       Date:  2011-06-06       Impact factor: 16.971

3.  Integrated genomics has identified a new AT/RT-like yet INI1-positive brain tumor subtype among primary pediatric embryonal tumors.

Authors:  Donald Ming-Tak Ho; Chuan-Chi Shih; Muh-Lii Liang; Chan-Yen Tsai; Tsung-Han Hsieh; Chin-Han Tsai; Shih-Chieh Lin; Ting-Yu Chang; Meng-En Chao; Hsei-Wei Wang; Tai-Tong Wong
Journal:  BMC Med Genomics       Date:  2015-06-25       Impact factor: 3.063

4.  Distance in cancer gene expression from stem cells predicts patient survival.

Authors:  Markus Riester; Hua-Jun Wu; Ahmet Zehir; Mithat Gönen; Andre L Moreira; Robert J Downey; Franziska Michor
Journal:  PLoS One       Date:  2017-03-23       Impact factor: 3.240

5.  ASB6 Promotes the Stemness Properties and Sustains Metastatic Potential of Oral Squamous Cell Carcinoma Cells by Attenuating ER Stress.

Authors:  Kai-Feng Hung; Po-Chen Liao; Chih-Kai Chen; Yueh-Ting Chiu; Dong-Hui Cheng; Masaoki Kawasumi; Shou-Yen Kao; Jeng-Fan Lo
Journal:  Int J Biol Sci       Date:  2019-04-22       Impact factor: 6.580

6.  Transcriptome profiling reveals an integrated mRNA-lncRNA signature with predictive value for long-term survival in diffuse large B-cell lymphoma.

Authors:  Qian Gao; Zhiyao Li; Lingxian Meng; Jinsha Ma; Yanfeng Xi; Tong Wang
Journal:  Aging (Albany NY)       Date:  2020-11-18       Impact factor: 5.682

7.  Forfeited hepatogenesis program and increased embryonic stem cell traits in young hepatocellular carcinoma (HCC) comparing to elderly HCC.

Authors:  Hsei-Wei Wang; Tsung-Han Hsieh; Ssu-Yi Huang; Gar-Yang Chau; Chien-Yi Tung; Chien-Wei Su; Jaw-Ching Wu
Journal:  BMC Genomics       Date:  2013-10-26       Impact factor: 3.969

Review 8.  Unique Molecular Features in High-Risk Histology Endometrial Cancers.

Authors:  Pooja Pandita; Xiyin Wang; Devin E Jones; Kaitlyn Collins; Shannon M Hawkins
Journal:  Cancers (Basel)       Date:  2019-10-27       Impact factor: 6.639

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.