Literature DB >> 35081118

KLF3 and PAX6 are candidate driver genes in late-stage, MSI-hypermutated endometrioid endometrial carcinomas.

Meghan L Rudd1, Nancy F Hansen1, Xiaolu Zhang1, Mary Ellen Urick1, Suiyuan Zhang2, Maria J Merino3, James C Mullikin1,4, Lawrence C Brody5, Daphne W Bell1.   

Abstract

Endometrioid endometrial carcinomas (EECs) are the most common histological subtype of uterine cancer. Late-stage disease is an adverse prognosticator for EEC. The purpose of this study was to analyze EEC exome mutation data to identify late-stage-specific statistically significantly mutated genes (SMGs), which represent candidate driver genes potentially associated with disease progression. We exome sequenced 15 late-stage (stage III or IV) non-ultramutated EECs and paired non-tumor DNAs; somatic variants were called using Strelka, Shimmer, SomaticSniper and MuTect. Additionally, somatic mutation calls were extracted from The Cancer Genome Atlas (TCGA) data for 66 late-stage and 270 early-stage (stage I or II) non-ultramutated EECs. MutSigCV (v1.4) was used to annotate SMGs in the two late-stage cohorts and to derive p-values for all mutated genes in the early-stage cohort. To test whether late-stage SMGs are statistically significantly mutated in early-stage tumors, q-values for late-stage SMGs were re-calculated from the MutSigCV (v1.4) early-stage p-values, adjusting for the number of late-stage SMGs tested. We identified 14 SMGs in the combined late-stage EEC cohorts. When the 14 late-stage SMGs were examined in the TCGA early-stage data, only Krüppel-like factor 3 (KLF3) and Paired box 6 (PAX6) failed to reach significance as early-stage SMGs, despite the inclusion of enough early-stage cases to ensure adequate statistical power. Within TCGA, nonsynonymous mutations in KLF3 and PAX6 were, respectively, exclusive or nearly exclusive to the microsatellite instability (MSI)-hypermutated molecular subgroup and were dominated by insertions-deletions at homopolymer tracts. In conclusion, our findings are hypothesis-generating and suggest that KLF3 and PAX6, which encode transcription factors, are MSI target genes and late-stage-specific SMGs in EEC.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35081118      PMCID: PMC8791453          DOI: 10.1371/journal.pone.0251286

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Defects in mismatch repair can result in DNA strand slippage and the appearance of microsatellite instability (MSI) [1]. MSI is common in endometrial carcinoma (EC) in which it occurs in ~30% of sporadic tumors. In this context, MSI generally results from MLH1 hypermethylation and is associated with a hypermutated genome [2-4]. MSI/hypermutated ECs are one of four distinct molecular subgroups of EC, defined by The Cancer Genome Atlas (TCGA) [2]. The three remaining subgroups are referred to as POLE/ultramutated, copy number-low/microsatellite stable (MSS), and copy number-high (serous-like) [2]. Each molecular subgroup has distinct clinical outcomes [2] (and reviewed in [5]) and the prognostic utility of this molecular classification is an area of active exploration. Endometrial carcinoma (EC) exacts a significant toll on women’s health. It resulted in 89,929 deaths globally in 2018 [6], and is projected to cause 12,940 deaths within the United States in 2021 [7]. Importantly, EC incidence is increasing annually in the US and many other countries [8]. This phenomenon is likely partly due to increasing rates of obesity [9], a well-recognized epidemiological risk factor for endometrioid endometrial carcinomas (EECs) that make up 75%-80% of all newly diagnosed endometrial tumors. EECs most often present as low-grade, early-stage (stage I or II) tumors, that are confined within the uterus [10]. Five-year survival rates for patients with low-grade, early-stage disease are high because surgery is often curative for this patient population, due to the limited extent of disease [10]. In contrast, patients with late-stage EEC have relatively poor outcomes [11], despite more aggressive treatment approaches of surgery with adjuvant chemotherapy or radiotherapy [12-14]. Thus, increasing tumor stage is an adverse prognosticator for EEC that is used in the clinical setting, as are high tumor grade (Grade 3; G3), and extent of lymphovascular space invasion [15]. The prognostic utility of molecular classification, according to POLE, microsatellite instability (MSI), and TP53/p53 status, is an area of active exploration originating from The Cancer Genome Atlas (TCGA) discovery that EECs can be subclassified into four molecular subgroups associated with distinct clinical outcomes [2](and reviewed in [5]). TCGA’s initial comprehensive molecular characterization of primary endometrial carcinomas included exome sequencing of 200 EECs [2]; an expanded analysis that included 188 additional EECs was subsequently reported [16]. These studies confirmed prior findings that EEC exhibits high frequencies of somatic alterations resulting in activation of the PI3-kinase pathway, the RAS-RAF-MEK-ERK pathway, and the WNT/β-catenin pathway, frequent mutations in ARID1A (BAF250A) tumor suppressor, and mismatch repair defects resulting in MSI [2,16-18]. Moreover, many additional “significantly mutated genes” (SMGs), which represent candidate pathogenic driver genes, were annotated in EECs by TCGA using statistical approaches [2]. Given the dynamic nature of tumor genomes during disease initiation and progression, it is conceivable that the repertoire of pathogenic driver genes may differ in late-stage compared to early-stage EEC. However, the annotation of SMGs in primary EEC exomes by TCGA was performed in a stage-agnostic manner [2,16]. An improved understanding of the molecular etiology of late-stage EEC may provide novel insights into disease pathogenesis and progression. The aim of this study was to delineate SMGs in late-stage EEC exomes, and to determine whether these genes are also significantly mutated in early-stage disease. To this end, we exome sequenced 15 “in-house” late-stage EECs (National Human Genome Research Institute (NHGRI) cohort) and reanalyzed somatic mutation calls from 66 late-stage and 270 early-stage non-ultramutated EECs within TCGA (). Collectively, we identified 14 SMGs in 81 late-stage tumors. Krüppel-like factor 3 (KLF3) and Paired box 6 (PAX6), which encode transcription factors, were SMGs in late-stage tumors, but were not statistically significantly mutated in early-stage tumors. All KLF3 mutations, and almost all PAX6 mutations, were in the MSI-hypermutated EEC subgroup; within this subgroup, KLF3 and PAX6 mutations were more frequent in late-stage than early-stage tumors. The mutation spectrum of both genes included recurrent insertions-deletions (indels) at homopolymer tracts, consistent with strand slippage resulting from mismatch repair defects and suggesting that PAX6 and KLF3 are likely MSI target genes. Flow diagram summarizing the approaches used in the step-wise analysis of somatic mutation data for (A) 15 late-stage endometrioid endometrial cancers (EECs) in the NHGRI tumor cohort, and for (B) 66 late-stage and 270 early-stage non-ultramutated EECs in The Cancer Genome Atlas (TCGA) tumor cohort.

Materials and methods

Ethics statement

The NHGRI cohort of de-identified, fresh-frozen endometrioid endometrial tumors and matched non-tumor (normal) samples were obtained from the Cooperative Human Tissue Network (CHTN). The National Institutes of Health Office of Human Subjects Research Protections determined that research using these specimens was exempt from IRB review. Because the specimens were obtained from CHTN as de-identified specimens with an agreement that we will never request re-identification, we do not have information on whether consent was written or oral.

NHGRI clinical specimens

For 15 cases in the NHGRI cohort, de-identified, fresh-frozen endometrioid endometrial tumors and matched non-tumor (normal) samples were obtained from the Cooperative Human Tissue Network (CHTN) (Table A in ). The National Institutes of Health Office of Human Subjects Research Protections determined that this research was not human subject research, per the Common Rule (45 CFR 46). For each tumor sample, an H&E stained section was reviewed by an experienced gynecologic pathologist to identify regions containing ≥70% neoplastic cellularity; accompanying surgical pathology reports were retrospectively evaluated by the same gynecologic pathologist to annotate tumor stage using the International Federation of Gynecology and Obstetrics (FIGO) 2009 classification (Table A in ).

Genomic DNA preparation and next-generation sequencing

Genomic DNA extraction, identity testing and MSI analysis of tumor and normal samples in the NHGRI cohort were performed as previously described [19]. DNA was purified by phenol-chloroform extraction prior to library preparation. DNA libraries were prepared using the SeqCap EZ Exome + UTR capture kit (Roche) and sequenced with the Illumina HiSeq 2000 platform (Illumina). A flow diagram summarizing the approaches and methods used to generate and analyze the NHGRI exomes is provided in .

Alignment and variant calling

Short sequence reads from NHGRI cohort exomes were aligned to the Hg19 human reference sequence using NovoAlign version 2.08.02 (University of California at Santa Cruz). Four somatic mutation detection algorithms, Strelka [20], Shimmer [21], SomaticSniper [22], and MuTect [23], were used to call potential somatic variants. Insertions and deletions (indels) were identified by Shimmer and Strelka, while single nucleotide variants (SNVs) were identified by all four somatic algorithms. Strelka workflow version 1.0.14 (https://doi.org/10.1093/bioinformatics/bts271) was run with default parameters. Shimmer version 0.2 (https://github.com/nhansen/shimmer) was run with–min_som_reads = 6 and—minqual = 20 [21]. SomaticSniper version 1.0.5 was run with options -Q 40 -G -L, followed by the "standard somatic detection filters" described in Larsen et al [22]. MuTect version 1.1.5 was run with default parameters, and data were then filtered to include only calls designated as "KEEP" in the program’s output [23]. Following analysis with each algorithm, a VarSifter-formatted file was generated containing the somatic variant allele frequencies observed in each tumor and matched normal sample for every called variant [24]. ANNOVAR (downloaded on August 12, 2014) was used to annotate all variants using the UCSC "known genes" gene structures [25].

Variant filtering

Coding, splicing, and non-coding (intronic, 3’ or 5’ untranslated region (UTR), and 1kb upstream of the transcription start or downstream of the transcription end site) somatic variant calls in the NHGRI cohort were displayed using VarSifter [24]. We prioritized mutations for the NHGRI tumors using criteria similar to those that have been shown to yield accurate mutation datasets in past studies [26-31]. A minimum of 14 reads covering a site in the tumor and 8 in the normal were required for mutation calling [26,27]; potential germline variants (those with a variant allele frequency (VAF) of greater than 3% in matched normal samples) were excluded. Coding and splice-site single nucleotide variants (SNVs) were annotated against dbSNP Build 135 and nonpathogenic single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) greater than 5% were excluded. Indel variants that were present in dbSNP Build 135 were excluded without further evaluation of MAF. SNVs called by all four algorithms and indels called by either Strelka or Shimmer were retained and further annotated against GENCODE hg19 using Oncotator (v1.5.3.0) (http://www.broadinstitute.org/oncotator) [32]; noncoding variants, those with a variant classification of UTR, Flank, lincRNA, RNA, Intron, or De novo start were excluded.

TCGA data analysis

A subset of TCGA Uterine Corpus Endometrial Carcinoma (UCEC) somatic mutation data (TCGA UCEC PanCancer Atlas [16]) was extracted from the MC3 Public MAF file (mc3.v0.2.8.PUBLIC.maf.gz, https://gdc.cancer.gov/about-data/publications/mc3-2017) [33]. Briefly, the MC3 Public MAF file was filtered to include somatic variants from 336 EECs from the MSI-hypermutated (n = 141), copy number-low/MSS (n = 140) or copy number-high (n = 55) molecular subgroups; variants from EECs within the ultramutated-POLE molecular subgroup or those without a molecular subgroup assignment were excluded (Table B in ). The TCGA mutation dataset used in our manuscript had been previously filtered to retain only the highest quality calls using both coverage and population frequency information [33]. Molecular subtype annotation for each sample was obtained from the cBioPortal for Cancer Genomics [34,35]. Variants with a PASS, WGA, or Native_WGA_mix designation as described by [33] were retained and further filtered to include SNVs called by MuTect and Indels called by Indelocator [16]. The final set of selected variants was annotated against GENCODE hg19 using Oncotator (v1.5.3.0) (http://www.broadinstitute.org/oncotator) [32]; noncoding variants, those with a variant classification of UTR, Flank, lincRNA, RNA, Intron, or De novo start were excluded. Additional clinicopathologic information for each tumor, including histology, stage, and grade, was obtained from Berger et al [16], and the cBioPortal for Cancer Genomics (URL: https://www.cbioportal.org/) [34,35] (Table B in ). Early-stage tumors were defined herein as stage I or II; late-stage tumors were defined as stage III or IV. A flow diagram summarizing the approaches and methods used to analyze the TCGA mutation calls is provided in .

Power analysis

MutSigCV’s statistical power to detect SMGs was estimated using the binomial model described in [36]. Briefly, the probability of obtaining a p-value < = 0.1/14 (for 14 tests) was calculated assuming a background mutation rate of p0 = 1−(1−μf)3/4, where μ is the background mutation rate, and f = 3.9 and L = 1500 are the 90th percentile gene-specific mutation rate factor and gene length, respectively. We also assumed a signal mutation rate of p1 = p0+r(1−m), where r is the frequency of non-silent mutations in tumor samples and m = 0.1 is the mis-detection rate. Power estimates were performed and plotted for a range of mutation rates and frequencies () using an R script available at https://github.com/nhansen/LateStageEECs.

Annotation of SMGs

SMGs were annotated using MutSigCV (v1.4). Briefly, MutSigCV (v1.4) was run on the NIH high-performance computing Biowulf cluster (http://hpc.nih.gov) using the coverage, covariate, and mutation type dictionary files provided by the Broad Institute. Filtered somatic variants for each data set were annotated against GENCODE hg19 using Oncotator (http://www.broadinstitute.org/oncotator) [32], noncoding variants were excluded in accordance with a published approach [37], and the resulting coding mutation annotation format (maf) files were uploaded to the Biowulf cluster. Somatically mutated genes with a false discovery rate (q-value) ≤0.10 were defined as SMGs in accordance with a published approach [36].

Determining whether late-stage SMGs are statistically significantly mutated in early-stage tumors

MutSigCV (v1.4) was run as described above on the set of filtered somatic variants from the 270 early-stage EECs to obtain p-values for all mutated genes. For all genes annotated as SMGs in late-stage tumors, q-values were re-calculated from the MutSigCV (v1.4) p-values assigned to the early-stage data, adjusting for 14 tests (reflecting the total number of SMGs identified in late-stage tumors).

In silico prediction of functional consequences for somatic variants

MutationAssessor [38], PROVEAN (Protein Variation Effect Analyzer) [39], SIFT (Sorting Intolerant From Tolerant) [40], and PolyPhen-2 (Polymorphism Phenotyping v2) [41], were used to predict the effects of missense mutations on protein function. For each algorithm, the following descriptors were considered as impacting protein function: “high” (MutationAssessor), “deleterious” (PROVEAN), “damaging” (SIFT), and “probably-damaging” (PolyPhen-2). Agreement across at least three of the four prediction methods was required to assign an overall determination of “functional impact” for a missense mutation.

Survival analyses

We utilized the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) to query the relationship between SMG mutation status and survival (overall-, disease-free-, progression-free-, and disease-specific-survival) stratifying cases by stage (all stages, early-stage, late-stage) and molecular subgroup (MSI-hypermutated, CN-low, CN-high, all non-ultramutated), and applying a Bonferroni correction to account for multiple testing.

Results

Identification of SMGs among late-stage EECs

For the NHGRI late-stage cohort (n = 15), the average depth of coverage within regions targeted by the capture kit for tumor and normal samples was 67.2x and 65.5x, respectively; 90.87% of targeted bases for each tumor/normal pair had sufficient coverage for variant calling (Table C in ). Using a combination of somatic variant calling algorithms and stringent filtering parameters, we identified 2,879 high-confidence coding and splice-site somatic variants (consisting of 2,214 nonsynonymous (1,405 SNVs, 809 indels), 92 splice-site, and 573 synonymous variants) (Table D in ). Combined, the 2,306 nonsynonymous and splice-site variants affected 1,968 protein-coding genes and averaged 153.7 variants per tumor (range 9–542 per tumor) (Table D and Table E in ). For the TCGA late-stage cohort (n = 66), we extracted a total of 28,996 somatic coding and splice-site variants distributed among 10,504 protein-encoding genes (Table F and Table G in ). Using MutSigCV (v1.4), we identified a total of 14 unique late-stage SMGs (), representing 6 SMGs (q-value ≤0.1) in the NHGRI () and 12 SMGs in the TCGA late-stage EEC cohorts ().

Statistically significantly mutated genes (SMGs) in late-stage EEC cohorts.

Venn diagram showing gene names for SMGs identified by MutSigCV (v1.4) analysis of 15 NHGRI late-stage EECs and 66 TCGA late-stage EECs. Late-stage tumors were defined as stage III or stage IV tumors. SMGs are defined as genes mutated in a tumor cohort at statistically significantly (q-value ≤0.10) higher rates than the background mutation rate. ** Non-silent mutations consist of nonsynonymous and splice junction mutations. ξ Data were extracted from previously published TCGA data. ** Non-silent mutations consist of nonsynonymous and splice junction mutations.

KLF3 and PAX6 are SMGs in late-stage but not early-stage EEC

To test whether each of the 14 late-stage SMGs are also statistically significantly mutated in the TCGA early-stage EECs (n = 270), we first estimated MutSigCV’s power to detect genes as significantly mutated in the early-stage cohort. Estimating power using a binomial model as described in [42], we determined that the data from 270 tumors, when tested on 14 genes, yields >95% power to detect genes as significantly mutated across a wide range of background mutation rates when at least 10% of the 270 tumors are mutated in that gene (). Next, we obtained somatic variants for the cohort of non-ultramutated TCGA early-stage EECs; there were 162,763 somatic coding- and splice-site variants affecting 17,435 protein-encoding genes (Table H and Table I in ). To determine whether any of the 14 late-stage SMGs were significantly mutated in this dataset, p-values for all somatically mutated genes in early-stage tumors were calculated and used to determine q-values adjusting for 14 tests (reflecting the 14 late-stage SMGs queried) using the Benjamini-Hochberg procedure [43] (). Results showed that 12 of 14 late-stage SMGs were statistically significantly mutated (q-value <0.1) in early-stage EECs whereas two late-stage SMGs, KLF3 and PAX6 were not (). Somatic mutations were more frequent among late-stage tumors than early-stage tumors for both KLF3 (10.6% (7 of 66) late-stage vs 4.8% (13 of 270) early-stage) and PAX6 (10.6% (7 of 66) late-stage vs 1.9% (5 of 270) early-stage) (). ξ Data were extracted from previously published TCGA data [16]. We constructed Q-Q plots to verify that our q-values, calculated using the Benjamini-Hochberg procedure on MutSigCV’s p-values, are the result of real statistical significance and not stratification of our dataset (). The Q-Q plots show significant deviation from ideal behavior due to MutSigCV’s testing model [44], and the limited number of tumors analyzed.

KLF3 and PAX6 mutations occur in MSI-hypermutated EEC and are predicted to affect protein function

For the TCGA cohorts, we evaluated the distribution of KLF3 and PAX6 mutations across the MSI-hypermutated (n = 141 cases), CN-low (n = 140 cases), and CN-high (n = 55 cases) molecular subgroups (). KLF3 mutations occurred exclusively in the MSI-hypermutated subgroup at an overall frequency of 14.2% (20 of 141 cases), which was significantly higher than the occurrence of KLF3 mutations among the combined CN-high and CN-low subgroups (0 of 195 cases) (p-value < 0.0001 2-tailed Fisher’s exact test). Within the MSI-hypermutated subgroup, KLF3 was mutated in 25.9% (7 of 27) of late-stage tumors versus 11.4% (13 of 114) of early-stage tumors. There were no statistically significant differences in KLF3 mutation frequency according to tumor grade; mutations were present in 14.2% of grade 1 (4 of 28), 8.1% of grade 2 (3 of 37), and 13.2% of grade 3 (11 of 83) MSI tumors (Table J in ). All but one (11 of 12) of PAX6 mutations were in the MSI subgroup; the PAX6X306_splice mutation was present in a CN-low tumor (). The higher frequency of PAX6 mutations in the MSI-hypermutated subgroup compared to other subgroups was statistically significant (p-value = 0.0004, 2-tailed Fisher’s exact test). Within the MSI- hypermutated subgroup, PAX6 was mutated in 7.8% (11 of 141) of tumors; mutations in late-stage tumors were more frequent compared to early-stage tumors (25.9% (7 of 27) versus 3.5% (4 of 114)). There was no significant difference in the frequency of PAX6 mutations between tumors of differing grade; PAX6 mutations were present in 3.6% of grade 1 (1 of 28), 13.5% of grade 2 (5 of 37) and 7.9% of grade 3 (6 of 76) MSI-hypermutated tumors (Table J in ). We observed no statistically significant differences in KLF3 or PAX6 mutation frequencies between POLE/POLD1-mutated and POLE/POLD1-wildtype cases within the MSI-hypermutated subgroup (Table K in ). A majority of KLF3 and PAX6 mutations were indels within homopolymer tracts, resulting in frameshifts; the KLF3K106Nfs*21, KLF3P226Rfs*52, KLF3Q227Afs*37, and PAX6P375Hfs*7 frameshift mutations were recurrent (). Six of 21 (28.6%) KLF3 mutations and 3 of 11 (27.3%) PAX6 mutations were missense mutations; KLF3R257W, KLF3R261G and PAX6A33T were predicted to affect protein function by 3 of 4 in silico algorithms (Table L in ).

Spectrum of KLF3 and PAX6 somatic mutations in late-stage and early-stage non-ultramutated EECs.

Lollipop plots showing the positions of somatic mutations in (A) KLF3 and (B) PAX6 relative to protein domains. Mutations in late-stage EEC (orange and red) cohorts, and the TCGA early-stage (blue) EEC cohort are distinguished. Abbreviations used: aa (amino acids); CtBP (C-terminal Binding Protein); C2H2 (Cysteine-Cysteine-Histidine-Histidine); fs (frameshift); P/S/T (Proline/Serine/Threonine).

Survival analysis

We utilized the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) to query the relationship between patient survival and somatic mutation status of all 14 late-stage SMGs identified herein, applying a Bonferroni correction to account for multiple testing (456 tests). With respect to KLF3 and PAX6 in the MSI-hypermutated subgroup, no significant differences in overall survival (OS), progression-free survival (PFS), disease-free survival (DFS) or disease-specific survival (DSS) were observed between mutated and non-mutated tumors when all stages were combined or when early- and late-stage tumors were considered separately (Table M in ). For the remaining 12 SMGs, there were no statistically significant differences in survival for any stage or molecular subgrouping (Table N through Table V in ).

Discussion

The mutational landscape of EEC was reported by TCGA in an initial 2013 study and a subsequent “pan-gyn” study which included the 2013 EEC cohort and additional cases. Both studies performed in silico annotation of SMGs, which represent candidate driver genes, in a stage-agnostic manner. However, cancer genomes are dynamic and the mutational repertoire of tumors can evolve during progression and metastasis [45]. Recent comparisons of primary and metastatic endometrial cancer genomes have demonstrated divergence in their mutational landscapes [46-48]. But exome-wide comparisons of late-stage and early-stage primary tumors are lacking. Here, our stage-specific analysis of TCGA mutation data for non-ultramutated EECs showed that KLF3 and PAX6 are SMGs in late-stage (III/IV) but not early-stage (I/II) disease, raising the possibility that KLF3 and PAX6 mutations undergo positive selection during tumor progression. KLF3 encodes a zinc finger transcription factor with roles in adipogenesis, erythroid maturation, B-cell differentiation, and cardiovascular development (reviewed in [49]). In the Human Protein Atlas, KLF3 expression was detected at “medium” levels in the normal in the glandular epithelium of the endometrium (https://www.proteinatlas.org/ENSG00000109787-KLF3/tissue/endometrium), by immunohistochemistry. The encoded protein includes an N-terminal CtBP-binding motif, three C-terminal Cys2His2 zinc finger domains, and a primary phosphorylation site at serine-249 that is important for DNA binding and enhancing transcriptional repression [49]. In our analysis of NHGRI EEC exomes and TCGA mutation data, the majority of KLF3 mutations, including three mutation hotspots, were frameshift mutations that occur N-terminal to the zinc finger domains and to serine-249. Because frameshift mutations often generate a downstream premature stop codon, they may result in the production of a truncated protein or the transcript may be subjected to nonsense-mediated decay resulting in haploinsufficiency [50]. Based on the positional rules for nonsense-mediated decay [51], it is likely that the KLF3 frameshift mutations among the ECs in this study result in nonsense-mediated decay and haploinsufficiency because the associated premature stop codons are located more than 50–55 nucleotides upstream of the final exon-exon junction [51]. In addition, in silico analyses predicted deleterious effects for the KLF3R257W and KLF3R261G missense mutants that occur in EEC; KLF3R257W also occurs somatically in 2 colorectal cancers (1 MSI-high/CIMP (CpG island methylator phenotype)-low; 1 CIN (chromosome instability)-subgroup) [52,53]. The fact that KLF3 mutations in EEC occur predominantly at homopolymer tracts, were restricted to the MSI-hypermutated EEC subgroup, and are more frequently mutated in late-stage than early-stage MSI-hypermutated tumors (25.9% versus 11.4%, respectively), indicate that KLF3 is an MSI target gene that may be involved in the etiology and progression of a subset of hypermutated EECs. Consistent with the idea that KLF3 is an MSI target gene, frameshift mutations at codons 106 and 227, which are recurrent in MSI-EECs, are also recurrent in the colorectal MSI-colorectal and MSI-stomach TCGA molecular subgroups [35,54,55]. Studies in other tumor types have reported KLF3 alterations as adverse prognosticators. For example, decreased KLF3 expression in colorectal and cervical cancers is associated with lymph node positivity and poorer outcomes [56,57]. Conflicting data exist regarding the occurrence and effects of reduced KLF3 levels in lung cancer. However, one study reported lower levels of KLF3 mRNA and protein expression in lung adenocarcinomas compared with adjacent normal tissues and more frequent loss of KLF3 expression in late- versus early-stage disease [58]. Although we found KLF3 is a late-stage-specific SMG in EEC, there was no significant association between KLF3 mutation status and survival for EEC patients, possibly reflecting tissue-specific differences in KLF3 association with outcome, and/or outcome differences between mutation and reduced expression of KLF3. The second late-stage-specific SMG identified in our study was PAX6. PAX6 encodes a highly conserved paired box transcription factor that includes paired box and homeobox DNA-binding domains and a C-terminal transactivation domain (TAD); the final 40 residues of the TAD influence homeobox-DNA binding [59]. In the Human Protein Atlas, PAX6 expression was undetectable by immunohistochemical analysis of the normal glandular epithelium of the endometrium (https://www.proteinatlas.org/ENSG00000007372-PAX6/tissue/endometrium). PAX6 has important roles in the development of several tissue types, including the eye (reviewed in [60]). Inherited and de novo nonsense and frameshift mutations in PAX6 cause the autosomal dominant eye disorder aniridia 1, whereas germline missense mutations are associated with attenuated ocular phenotypes [61]. Dysregulation of PAX6 expression has been implicated in a variety of human cancers, resulting in tumor suppressive or oncogenic phenotypes depending on the cellular context [62-74]. A recent study reported a potential role for epigenetic silencing of PAX6 in EC progression based on hypermethylation of PAX6 in primary EC versus endometrial hyperplasia, and in metastatic EC versus primary EC [75]. Our analysis of TCGA mutation data found that PAX6 mutations almost exclusively occur in MSI-hypermutated tumors. This observation, coupled with the fact that PAX6 mutations were more frequent among late-stage than early-stage MSI-hypermutated tumors (25.9% versus 3.5%, respectively), raise the possibility that, like KLF3 mutations, PAX6 mutations may be pathogenic drivers of tumor progression in the context of MSI-hypermutated EECs. Most PAX6 mutations in TCGA MSI-hypermutated EECs were the recurrent PAX6P375Hfs*7 frameshift mutation in the transactivation domain [2,16]. We predict that PAX6P375Hfs*7 and an adjacent PAX6H376Tfs*36 frameshift mutation encode truncated proteins with reduced transactivation capacity, because the associated premature stop codons are located within 50 nucleotides of the penultimate exon-exon junction [51] and are located proximal to a synthetic nonsense mutation (PAX6Q422X) that exhibits reduced transactivation capacity in vitro [76]. Moreover, the fact that the PAX6P375Q aniridia-associated missense mutation results in attenuated DNA binding affinity in vitro [76], raises the possibility that the recurrent PAX6P375Hfs*7 mutant also may have attenuated DNA binding. Similar to KLF3 frameshift mutations, the PAX6P375Hfs*7 and PAX6H376Tfs*36 frameshift mutations in EEC both arise within a (C)7 homopolymer tract indicating that PAX6 is an MSI target gene. Consistent with this idea is the fact that PAX6 frameshift mutations originating at codon 375 and/or codon 376 are also recurrent in MSI-stomach cancer and MSI-colorectal carcinoma [34,35,77]. Compared to frameshift mutations, PAX6 missense mutations are relatively rare in the non-ultramutated TCGA cohort, occurring in three cases. The PAX6A33T EC-mutant occurs in the N-terminal paired box domain at a residue highly conserved across paired domains in Pax family members and other proteins and is predicted to impact function [78]. A different substitution at this residue (PAX6A33P) exhibits altered transactivation activity in vitro and is a germline variant associated with partial aniridia [78,79]. The other two PAX6 missense mutations in EC (PAX6E220G and PAX6G141S) were not uniformly predicted to be functionally significant in our analysis and, to our knowledge, are not pathogenic variants for ocular phenotypes. In conclusion, our findings indicate that KLF3 and PAX6 are candidate driver genes in a subset of late-stage hypermutated EECs and are MSI target genes. Despite sufficient power, neither KLF3 nor PAX6 were detected as candidate driver genes in early-stage EECs. To our knowledge, this is the first study to annotate KLF3 and PAX6 as late stage-specific SMGs in EEC. Our findings warrant future studies to independently validate the enrichment of PAX6 and KLF3 mutations in late-stage, MSI-hypermutated EECs, to determine expression levels of KLF3 and PAX6 proteins in endometrial tumors, and to determine the functional effects of recurrent frameshift mutations in these genes particularly in regard to phenotypic properties associated with tumor progression.

Flow diagram summarizing the approaches and methods used in the step-wise generation and analysis of NHGRI somatic mutation data for 15 late-stage endometrioid endometrial cancers (EECs) in the NHGRI tumor cohort.

(PPTX) Click here for additional data file.

Flow diagram summarizing the approaches and methods used in the step-wise analysis of TCGA somatic mutation calls for 336 non-ultramutated endometrioid endometrial cancers (EECs) within the Uterine Corpus Endometrial Carcinoma (UCEC) cohort.

(PPTX) Click here for additional data file.

Power to detect significantly mutated genes (SMGs) in early-stage tumors.

Curves show statistical power for different percentages (r) of tumors that are somatically mutated. Calculations were performed as described in the text, assuming 270 tumors and 14 gene tests completed. (PPTX) Click here for additional data file. Q-Q plots for MutSigCV’s p-values for differing mutation rates in background genes in (A) the 15 late-stage tumors sequenced and analyzed at NHGRI and (B) the set of 66 late-stage tumors from the TCGA project. Deviation from uniform p-value distribution here is a result of MutSigCV’s assigned p-value and probably due to the limited number of tumors analyzed. (PPTX) Click here for additional data file.

Contains supporting Tables A through V.

(XLSX) Click here for additional data file.
Table 1

SMGs (q≤0.10) identified within the NHGRI cohort of 15 late-stage EEC exomes.

Gene symbolGene nameNumber and frequency (%) of NHGRI late-stage tumors with non-silent mutation(s)**p-valueq-value
PTEN Phosphatase and tensin homolog13 (86.7%)2.11E-153.98E-11
ARID1A AT-rich interaction domain 1A11 (73.3%)1.27E-101.20E-06
RPL22 Ribosomal protein L224 (26.7%)1.44E-079.07E-04
OR6C75 Olfactory receptor family 6 subfamily C member 754 (26.7%)7.84E-063.70E-02
CTCF CCCTC-binding factor5 (33.3%)1.26E-054.74E-02
AP1S1 Adaptor related protein complex 1 subunit sigma 13 (20.0%)2.13E-056.70E-02

** Non-silent mutations consist of nonsynonymous and splice junction mutations.

Table 2

SMGs (q≤0.10) identified among 66 late-stage TCGA EECs.

Gene symbolGene nameξ Number and frequency (%) of TCGA late-stage tumors with non-silent mutation(s)p-valueq-value
ARID1A AT-rich interaction domain 1A29 (43.9%)00
PIK3R1 Phosphoinositide-3-kinase regulatory subunit 121 (31.8%)5.55E-165.24E-12
PTEN Phosphatase and tensin homolog46 (69.7%2.00E-151.26E-11
PIK3CA Phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha25 (37.9%)7.27E-143.43E-10
TP53 Tumor protein p5319 (28.8%)3.07E-131.16E-09
KRAS KRAS proto-oncogene, GTPase18 (27.3%)7.22E-112.27E-07
RPL22 Ribosomal protein L226 (9.1%)4.89E-081.08E-04
CTCF CCCTC-binding factor12 (18.2%)4.99E-081.08E-04
CTNNB1 Catenin beta 118 (27.3%)5.16E-081.08E-04
PAX6 Paired box 67 (10.6%)1.04E-061.96E-03
RNF43 Ring finger protein 439 (13.6%)4.61E-067.91E-03
KLF3 Kruppel like factor 37 (10.6%)6.34E-069.96E-03

ξ Data were extracted from previously published TCGA data.

** Non-silent mutations consist of nonsynonymous and splice junction mutations.

Table 3

PAX6 and KLF3 are the only late-stage EEC SMGs (q-value ≤0.1) that are not statistically significantly mutated in early-stage EEC.

Gene symbol for 14 SMGs (q-value ≤0.1), identified in late-stage EEC cohortsMutSigCV (1.4) p-value in early-stage EEC TCGA cohortMutSigCV (1.4) q-value in early-stage EEC TCGA cohort corrected for 14 tests
AP1S1 0.00E+000.00E+00
ARID1A 0.00E+000.00E+00
CTNNB1 0.00E+000.00E+00
PIK3CA 0.00E+000.00E+00
PTEN 0.00E+000.00E+00
RNF43 0.00E+000.00E+00
RPL22 0.00E+000.00E+00
TP53 1.11E-151.94E-15
CTCF 2.78E-154.05E-15
PIK3R1 2.89E-154.05E-15
KRAS 6.88E-158.76E-15
OR6C75 1.89E-032.21E-03
PAX6 1.29E-011.39E-01
KLF3 9.99E-019.99E-01
Table 4

Frequency of non-silent KLF3 and PAX6 mutations in non-ultramutated EECs, according to molecular subgroup.

ξTumor stage and molecular subgroupξKLF3 mutation frequencyξPAX6 mutation frequency
All stages of EEC (n = 336) 5.9% (20 of 336) 3.6% (12 of 336)
MSI subgroup (n = 141)14.2% (20 of 141)7.8% (11 of 141)
CN low subgroup (n = 140)0% (0 of 140)0.7% (1 of 140)
CN high subgroup (n = 55)0% (0 of 55)0% (0 of 55)
Late-stage EECs (n = 66) 10.6% (7 of 66) 10.6% (7 of 66)
MSI subgroup (n = 27)25.9% (7 of 27)25.9% (7 of 27)
CN low subgroup (n = 21)0% (0 of 21)0% (0 of 21)
CN high subgroup (n = 18)0% (0 of 18)0% (0 of 18)
Early-stage EECs (n = 270) 4.8% (13 of 270) 1.9% (5 of 270)
MSI subgroup (n = 114)11.4% (13 of 114)3.5% (4 of 114)
CN low subgroup (n = 119)0% (0 of 119)0.8% (1 of 119)
CN high subgroup (n = 37)0% (0 of 37)0% (0 of 37)

ξ Data were extracted from previously published TCGA data [16].

  77 in total

Review 1.  Old and new perspectives in the pharmacological treatment of advanced or recurrent endometrial cancer: Hormonal therapy, chemotherapy and molecularly targeted therapies.

Authors:  Angiolo Gadducci; Stefania Cosio; Andrea Riccardo Genazzani
Journal:  Crit Rev Oncol Hematol       Date:  2006-01-24       Impact factor: 6.312

2.  Comparative Molecular Analysis of Gastrointestinal Adenocarcinomas.

Authors:  Yang Liu; Nilay S Sethi; Toshinori Hinoue; Barbara G Schneider; Andrew D Cherniack; Francisco Sanchez-Vega; Jose A Seoane; Farshad Farshidfar; Reanne Bowlby; Mirazul Islam; Jaegil Kim; Walid Chatila; Rehan Akbani; Rupa S Kanchi; Charles S Rabkin; Joseph E Willis; Kenneth K Wang; Shannon J McCall; Lopa Mishra; Akinyemi I Ojesina; Susan Bullman; Chandra Sekhar Pedamallu; Alexander J Lazar; Ryo Sakai; Vésteinn Thorsson; Adam J Bass; Peter W Laird
Journal:  Cancer Cell       Date:  2018-04-02       Impact factor: 31.743

3.  Analysis of mutational signatures in primary and metastatic endometrial cancer reveals distinct patterns of DNA repair defects and shifts during tumor progression.

Authors:  Charles W Ashley; Arnaud Da Cruz Paula; Rahul Kumar; Diana Mandelker; Xin Pei; Nadeem Riaz; Jorge S Reis-Filho; Britta Weigelt
Journal:  Gynecol Oncol       Date:  2018-11-08       Impact factor: 5.482

4.  Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer.

Authors:  Christopher E Barbieri; Sylvan C Baca; Michael S Lawrence; Francesca Demichelis; Mirjam Blattner; Jean-Philippe Theurillat; Thomas A White; Petar Stojanov; Eliezer Van Allen; Nicolas Stransky; Elizabeth Nickerson; Sung-Suk Chae; Gunther Boysen; Daniel Auclair; Robert C Onofrio; Kyung Park; Naoki Kitabayashi; Theresa Y MacDonald; Karen Sheikh; Terry Vuong; Candace Guiducci; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Gordon Saksena; Douglas Voet; Wasay M Hussain; Alex H Ramos; Wendy Winckler; Michelle C Redman; Kristin Ardlie; Ashutosh K Tewari; Juan Miguel Mosquera; Niels Rupp; Peter J Wild; Holger Moch; Colm Morrissey; Peter S Nelson; Philip W Kantoff; Stacey B Gabriel; Todd R Golub; Matthew Meyerson; Eric S Lander; Gad Getz; Mark A Rubin; Levi A Garraway
Journal:  Nat Genet       Date:  2012-05-20       Impact factor: 38.330

5.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data.

Authors:  David E Larson; Christopher C Harris; Ken Chen; Daniel C Koboldt; Travis E Abbott; David J Dooling; Timothy J Ley; Elaine R Mardis; Richard K Wilson; Li Ding
Journal:  Bioinformatics       Date:  2011-12-06       Impact factor: 6.937

6.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.

Authors:  Katherine A Hoadley; Christina Yau; Toshinori Hinoue; Denise M Wolf; Alexander J Lazar; Esther Drill; Ronglai Shen; Alison M Taylor; Andrew D Cherniack; Vésteinn Thorsson; Rehan Akbani; Reanne Bowlby; Christopher K Wong; Maciej Wiznerowicz; Francisco Sanchez-Vega; A Gordon Robertson; Barbara G Schneider; Michael S Lawrence; Houtan Noushmehr; Tathiane M Malta; Joshua M Stuart; Christopher C Benz; Peter W Laird
Journal:  Cell       Date:  2018-04-05       Impact factor: 41.582

7.  Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity.

Authors:  Austin M Dulak; Petar Stojanov; Shouyong Peng; Michael S Lawrence; Cameron Fox; Chip Stewart; Santhoshi Bandla; Yu Imamura; Steven E Schumacher; Erica Shefler; Aaron McKenna; Scott L Carter; Kristian Cibulskis; Andrey Sivachenko; Gordon Saksena; Douglas Voet; Alex H Ramos; Daniel Auclair; Kristin Thompson; Carrie Sougnez; Robert C Onofrio; Candace Guiducci; Rameen Beroukhim; Zhongren Zhou; Lin Lin; Jules Lin; Rishindra Reddy; Andrew Chang; Rodney Landrenau; Arjun Pennathur; Shuji Ogino; James D Luketich; Todd R Golub; Stacey B Gabriel; Eric S Lander; David G Beer; Tony E Godfrey; Gad Getz; Adam J Bass
Journal:  Nat Genet       Date:  2013-03-24       Impact factor: 38.330

8.  Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors:  Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal:  Nature       Date:  2013-06-16       Impact factor: 49.962

9.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.

Authors:  Kristian Cibulskis; Michael S Lawrence; Scott L Carter; Andrey Sivachenko; David Jaffe; Carrie Sougnez; Stacey Gabriel; Matthew Meyerson; Eric S Lander; Gad Getz
Journal:  Nat Biotechnol       Date:  2013-02-10       Impact factor: 54.908

10.  The genomic landscape and evolution of endometrial carcinoma progression and abdominopelvic metastasis.

Authors:  William J Gibson; Erling A Hoivik; Mari K Halle; Amaro Taylor-Weiner; Andrew D Cherniack; Anna Berg; Frederik Holst; Travis I Zack; Henrica M J Werner; Kjersti M Staby; Mara Rosenberg; Ingunn M Stefansson; Kanthida Kusonmano; Aaron Chevalier; Karen K Mauland; Jone Trovik; Camilla Krakstad; Marios Giannakis; Eran Hodis; Kathrine Woie; Line Bjorge; Olav K Vintermyr; Jeremiah A Wala; Michael S Lawrence; Gad Getz; Scott L Carter; Rameen Beroukhim; Helga B Salvesen
Journal:  Nat Genet       Date:  2016-06-27       Impact factor: 38.330

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.