Literature DB >> 28481342

Recurrent noncoding regulatory mutations in pancreatic ductal adenocarcinoma.

Michael E Feigin^1,2, Tyler Garvin³, Peter Bailey⁴, Nicola Waddell^5,6, David K Chang^4,7,8,9, David R Kelley¹⁰, Shimin Shuai¹¹, Steven Gallinger^12,13, John D McPherson¹⁴, Sean M Grimmond^4,6, Ekta Khurana¹⁵, Lincoln D Stein^11,16, Andrew V Biankin^4,9,17, Michael C Schatz^1,18,19, David A Tuveson^1,2,20.

Abstract

The contributions of coding mutations to tumorigenesis are relatively well known; however, little is known about somatic alterations in noncoding DNA. Here we describe GECCO (Genomic Enrichment Computational Clustering Operation) to analyze somatic noncoding alterations in 308 pancreatic ductal adenocarcinomas (PDAs) and identify commonly mutated regulatory regions. We find recurrent noncoding mutations to be enriched in PDA pathways, including axon guidance and cell adhesion, and newly identified processes, including transcription and homeobox genes. We identified mutations in protein binding sites correlating with differential expression of proximal genes and experimentally validated effects of mutations on expression. We developed an expression modulation score that quantifies the strength of gene regulation imposed by each class of regulatory elements, and found the strongest elements were most frequently mutated, suggesting a selective advantage. Our detailed single-cancer analysis of noncoding alterations identifies regulatory mutations as candidates for diagnostic and prognostic markers, and suggests new mechanisms for tumor evolution.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2017 PMID： 28481342 PMCID： PMC5659388 DOI： 10.1038/ng.3861

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

INTRODUCTION

Pancreatic ductal adenocarcinoma (PDA) is a highly lethal malignancy with a 5-year survival rate of 6%, due to therapy resistance and late stage at diagnosis[1]. A detailed understanding of the molecular alterations underlying PDA is required to uncover mechanisms of tumorigenesis and enable development of effective therapies. Exome sequencing efforts have revealed genes (KRAS, TP53, CDKN2A, SMAD4) and pathways (Wnt/Notch, transforming growth factor-β (TGF-β, axon guidance, cell adhesion) important for PDA progression[2,3]. However, the exome comprises less than 2% of the human genome. Whole-genome sequencing (WGS) analyses have uncovered an average somatic mutation rate of 2.64 mutations per megabase in PDA indicating that PDA tumors often carry thousands of mutations, the vast majority of which are located in noncoding regions and are completely uncharacterized.[4] Relevance of noncoding mutations (NCMs) to cancer development was previously established with the discovery of highly recurrent mutations in the telomerase reverse transcriptase (TERT) promoter in sporadic and familial melanoma[5,6]. These mutations create binding motifs for ETS transcription factors and lead to increased TERT transcriptional activity[5,7]. Subsequent reports identified TERT promoter mutations in a wide-range of human tumors, including glioblastoma and hepatocellular carcinoma[8]. TERT promoter mutations are the most common genetic alterations in bladder cancer and correlate with recurrence and survival, demonstrating the potential of NCMs to act as clinical biomarkers[9]. NCMs have also been demonstrated to drive tumor progression from intergenic elements. Somatic mutations in a subset of T-cell acute lymphoblastic leukemia cases generate binding sites for the MYB transcription factor, creating a super-enhancer driving expression of the TAL1 oncogene[10]. Recent analyses have pooled WGS data from multiple cancer types and hundreds of patients, identifying recurrent mutations in regulatory elements of several genes, including TERT[11-15]. While multi-cancer studies can identify ubiquitous cancer variants, in-depth analysis of individual cancer subtypes is required for uncovering disease-specific alterations[16]. To detect somatic NCMs in PDA, we developed a computational pipeline to analyze WGS data of 308 PDA tumors from the International Cancer Genome Consortium (ICGC)[17]. We used FunSeq2[18,19] to initiate prioritization of noncoding mutations, which revealed hundreds of thousands of noncoding somatic mutations with potential functional implications. To discriminate amongst this large number of NCMs, we developed GECCO (Genomic Enrichment Computational Clustering Operation) to identify candidate NCMs that drive differential gene expression. This approach reduced the number of putative gene-proximal regulatory regions by three orders of magnitude to a set of high confidence calls. Using GECCO, we identify novel recurrent mutations and interrogate expression data from matched tumors to find variants associated with changes in mRNA levels. We find significant differential expression of 16 genes associated with NCMs. For two of these genes, PTPRN2 and SLC12A8 we uncover previously unidentified clinical relevance in PDA. Specifically, we find that PTPRN2 expression level is an independent prognostic variable for overall patient survival. Pathway analysis of the genes associated with recurrent NCMs identifies known and novel PDA pathways. Furthermore, we find enrichment for mutations in specific regulatory regions, suggesting that NCMs may be acted upon by selection during tumor formation. Our analysis provides a model for tumor evolution via the formation and selection for alterations in noncoding regulatory elements of specific genes as a means of control over specific biological pathways.

RESULTS

To analyze NCMs in PDA, we selected all 405 patients with WGS data from the ICGC Pancreatic Cancer Genome Project. We determined the total number of somatic single nucleotide variants (SNV) and small insertions or deletions (indels) for each patient, and retained those with mutation load no greater or less than 3 standard deviations from the mean (mean=7,937; range=1–440,471) to exclude the hyper-mutated tumors with unlocalized replication defects (Fig. 1a, Supplementary Fig. 1). In total, 2,248,158 SNVs/indels from 308 PDA patient samples were kept for analysis.

Figure 1

Identification of recurrent noncoding mutations in PDA

(a) The total number of single nucleotide variants (SNV) was plotted for each patient. (b) FunSeq2 was utilized to detect and characterize putatitve somatic noncoding mutations from 308 PDA whole genome sequences. Mutation counts for each functional category are displayed. (c) The number of cis-regulatory region (CRR) mutations (grey bars), and CRR/total SNV (black points) were plotted for each patient.

General features of GECCO

To discover the effect of noncoding mutations on PDA progression and patient outcome we developed the computational pipeline GECCO (Fig. 2). GECCO begins by selecting noncoding mutations falling within The Encyclopedia of DNA Elements[20] (ENCODE)-defined transcription factor binding peaks – hereby referred to as cis-regulatory regions (CRRs) as not all proteins profiled are transcription factors and may be part of larger regulatory complexes – and then proceeds with downstream processing in two parallel modules. We define a “CRR class” to be all CRRs that are bound by the same DNA-binding protein (i.e. CTBP2, with 1781 CRRs across the genome) or proteins involved in DNA-binding complexes (i.e. SUZ12, with 1618 CRRs across the genome). The first module of GECCO associates NCMs with proximal genes and uses permutation testing to identify highly mutated clusters that correlate significantly with changes in gene expression. The second module calculates the mutation rate of each CRR to determine which specific CRR classes are more commonly mutated in PDA.

Figure 2

GECCO (Genomic Enrichment Computational Clustering Operation) flowchart

GECCO utilizes noncoding somatic mutation calls from tumor whole genome sequencing data to identify clusters of mutations within 2kb of genes, including those that correlate with changes in gene expression. GECCO also calculates the mutation rate of gene regulatory regions and determines the strength of each regulatory region in terms of the effect on gene expression (expression modulation score, EMS). These data can then be used for pathway analysis of genes proximal to noncoding clusters and genes downstream of specific regulatory regions. The gene lists can also be interrogated for patient survival analysis when coupled to outcome data for detection of clinically relevant interactions.

In the second module, GECCO computes an expression modulation score (EMS) using coupled gene expression data to determine the regulatory impact of each CRR class. The EMS can be used to generate a rank sorted list of CRRs based on the strength of their relative gene regulatory impact (such that the strongest activators and repressors fall at both ends of the list). Taken together, the results generated from GECCO provide information on the impact of NCMs on the expression level of individual genes and identifies potential driver transcription factors. Finally, GECCO merges the results of both modules to perform pathway and clinical survival analysis, allowing novel insights into PDA biology and patterns of somatic mutations in cancer.

Prioritization of non-coding mutations

We first identified NCMs in the exact same genomic position in multiple patients and removed common human variants (MAF > 5% in 1000 Genome Phase I) (Supplementary Table 1). This identified several variants reaching over 2% incidence (n ≥ 7 out of 308 patients) in the patient cohort (Supplementary Table 1). Among the 11 genes associated with these variants, 6 have been implicated in tumorigenesis, including WASF3[21], BNC2[22], ELMO1[23], GPR98[24], PDE3B[25] and SOX5[26]. Interestingly, 10 of 11 of these mutations were found in introns. However, none of the exactly recurrent mutations disrupted, or created, transcription factor-binding motifs (as defined by the JASPAR transcription factor binding profile database[27]) or fell within known regulatory elements. This analysis is consistent with several pan-cancer analyses that found few exactly recurrent mutations outside of the well-characterized TERT promoter mutations[11,12]. We extended this analysis by prioritizing NCMs by their association with functional annotations and clustering within regulatory elements. We used the FunSeq2 computational pipeline[18,19] as a high-level filter to remove common variants and identify putative somatic regulatory mutations with functional impact. One important benefit of this approach is that it relies on functional information and thus drastically reduces any biases resulting from non-homogeneous mutation rates across the genome. This initial round of filtering identified 301,596 potential somatic drivers across all 308 patients (mean=1,988; range=203–17,902) (Fig. 1b). 264,488 of the somatic NCMs fell within ENCODE-defined transcription factor-binding peaks, with the majority of the remaining mutations within enhancers (19,608) or DNaseI hypersensitive sites (DHSs) (14,572) (Fig. 1b). We focused our analysis on the 264,488 NCMs within the ENCODE-defined CRRs. There was a direct correlation between CRR mutation rate and total SNVs (Fig. 1c). In contrast, no correlations between CRR mutation rate and coding mutations in KRAS, TP53, CDKN2A, SMAD4, and ARID1A were observed (Supplementary Fig. 3).

Analysis of cis-regulatory mutations

Starting with 264,488 candidate mutations, we used GECCO to focus our analysis on CRRs within 2kb of each gene (many of which overlap promoters), seeking to identify clusters of mutations in CRRs that directly impact gene expression (Fig. 3a). The requirement to be within 2kb of a gene excludes many distal enhancer regions but increases the likelihood that a given CRR topologically associates with, and therefore regulates, the expression of its proximal gene. The most frequently mutated CRR (17 patients, 5.52% of cohort) was in a TCF12-binding region proximal to LHX8 (LIM homeobox 8) (Fig. 3a). LHX8, a homeobox gene and regulator of craniofacial development, modulates the Hedgehog pathway, a known regulator of PDA pathogenesis[28]. We observed a cluster of mutations in a E2F1-binding region in proximity to BMP7 (bone morphogenetic protein 7). BMP7 is a TGF-β family member, with pleiotropic roles in development and cancer progression[29]. GECCO did not detect any recurrent variants in the TERT promoter, in concordance with a previous study that failed to detect TERT promoter mutations in 24 PDA samples[8]. To determine if the identified NCMs were within active promoters or enhancers in pancreatic cells, we interrogated H3K4me3 and H3K27ac regions from ENCODE in pancreatic carcinoma-derived PANC-1 cells. In PANC-1 cells, 37.6% of all transcription factor-binding peaks were found within active PANC-1-predicted promoters or enhancers. In contrast, 58.9% of recurrent NCMs (>5 patients) were found within at least one PANC-1-predicted active promoter or enhancer. The CRRs with recurrent NCMs did not differ significantly in size from those lacking recurrent NCMs. Therefore, recurrent NCMs are enriched in transcriptionally active regions of the genome in pancreatic cancer cells.

Figure 3

Clustered gene-proximal mutations and pathways in PDA

(a) The most common mutational clusters across the patient cohort as determined by GECCO, with associated genes; Yes = knockdown promoted cell death in shRNA cancer cell line screen. (P denotes PDA-specific); No = no evidence for effect on cell death in shRNA cancer cell line screen. (b) Most significant clusters when corrected for cluster size as determined by GECCO. (c) DAVID pathway analysis was used to identify regulatory processes and pathways from genes associated with recurrent NCMs.

We identified clusters of NCMs in regulatory regions of long intergenic non-protein coding RNAs (lncRNAs), including the oncogenic lncRNA Metastasis Associated Lung Adenocarcinoma Transcript 1 (MALAT1)[30], and in microRNAs, including the oncogenic miR-21[31] (Fig. 3a). To infer functional consequences of the most recurrently mutated gene-proximal CRRs, we used data from a published in vitro short hairpin RNA (shRNA) screen, which monitored survival in 102 cell lines, of which 13 were pancreas cancer-derived[32]. Knockdown of 6 (LHX8, LMX1B, PAX6, DMRTA2, VAX2, CDH15) of the top 15 genes was found to decrease cancer cell survival, providing potential functional relevance for these genes as cancer drivers (Fig. 3a). Knockdown of two genes, LMX1B and CDH15, showed selective killing of PDA cell lines amongst all cancers, suggesting tumor-specific vulnerabilities. To control for variable CRR size, we calculated a mutational frequency for each cluster harboring at least 5 mutations, defined as the number of mutations across all patients divided by the number of nucleotides spanning the cluster (Fig. 3b). The highest scoring result was an exactly recurrent mutation in the same genomic position in 5 patients, flanking the acyl-CoA oxidase-like gene ACOXL, a known susceptibility locus for chronic lymphocytic leukemia[33]. This mutation was not found to be within a known transcription factor-binding site as defined by JASPAR. We also identified a cluster of 5 mutations within 19 nucleotides proximal to the neuronal cell adhesion gene NRXN3, a regulator of glioma cell proliferation and migration[34]. While multi-cancer recurrent NCMs have been described[11,12], we lack an understanding of their mutational patterns. For example, it is unknown if NCMs cluster near the same genes that show recurrent coding mutations for a given disease. Therefore, we looked for clusters of NCMs in association with known PDA genes, present in at least 5 patients (Supplementary Table 2). We did not detect any recurrent NCMs in CRRs within 2kb of KRAS, TP53, CDKN2A, SMAD4, ARID1A and MLL3, in addition to 24 of 26 other PDA genes identified from previous whole exome analyses (Supplementary Table 2)[2,3]. This result is consistent with defects in protein function, rather than alterations in expression, in the pathogenesis of these PDA genes.

Novel clinical outcomes from pathway analysis

Pathway analysis of recurrently mutated PDA genes has been used to identify signaling networks and biological processes underlying disease pathogenesis[2,3]. To detect patterns in NCM localization at the pathway level, we utilized The Database for Annotation, Visualization and Integrated Discovery (DAVID), a functional annotation enrichment algorithm for large-scale biological datasets[35]. Pathway analysis of genes near CRRs containing clusters of mutations (>5 patients) identified significant enrichment of several gene families and regulatory processes, including transcriptional regulation, homeobox genes, axon guidance, cell adhesion and Wnt signaling (Fig. 3c). The involvement of three of these pathways (axon guidance, cell adhesion, Wnt signaling) in PDA has been identified from previous exome sequencing studies[2,3]. Furthermore, several homeobox genes and transcription factors have been implicated in PDA pathogenesis, including PAX6[36], HOXB2[37], HOXB7[38] and RUNX3[39]. Therefore, NCMs display preferential patterns of localization in the PDA genome and, although not found near canonical PDA genes, may act through modulation of canonical PDA pathways. In addition, we uncover a previously unrecognized localization of NCMs near transcriptional regulators and homeobox genes, suggesting a role for these factors in PDA. The availability of matched gene expression data from a large number (n=96) of patient samples allowed association studies between specific clusters of mutations and changes in gene expression. For each of the 124,075 CRRs we determined differential gene expression between patients with mutations in a proximal CRR compared to patients without mutations. Using permutation testing we identified NCMs that significantly impacted expression of their proximal gene and calculated their false discovery rates (for details, see Online Methods). Many of the genes with the greatest number of mutations (Fig. 3a) did not reveal significant changes in gene expression. However, this analysis yielded 16 NCMs associated with significant changes in gene expression (≥3 patients, p<0.05, FDR<0.25) (Fig. 4a). Eight of the 16 NCMs were present in regions marked by H3K4me3 and H3K27ac in PANC-1 cells. None of the statistically significant mutations were associated with increases in gene expression. Three of the genes with statistically significant decreases in expression (KCNQ1, IKZF1, TUSC7) have been implicated as tumor suppressors[40,41], while two (PTPRN2, SNRPN) are frequently hypermethylated[42,43]. Next, we looked for correlations between NCM-associated differential expression and clinical correlates in PDA. The small sample size precluded identification of specific NCMs associated with differences in patient outcome. Therefore, we looked for associations between expression of these 16 genes and patient outcome. Low mRNA expression of the phosphatase PTPRN2 and the ion transporter SLC12A8 were associated with decreased overall survival and decreased disease-free survival in a univariate analysis, respectively (Fig. 4b,c). Furthermore, a multivariate analysis revealed PTPRN2 as an independent prognostic variable for overall survival (Supplementary Table 3).

Figure 4

Recurrent gene-proximal mutations correlate with gene expression changes in PDA

(a) GECCO used gene expression data from matched PDA patients to correlate NCMs with changes in gene expression “Mut allele” = mean expression of linked gene in patients with associated CRR mutations. “WT allele” = mean expression of linked gene in patients without associated CRR mutations. (b) Analysis of overall survival (OS) in PDA patients expressing high (upper 2/3) and low (lower 1/3) levels of PTPRN2. Purple dots represent patients with high expression of PTPRN2 “at risk” (alive). Red dots represent patients with low expression of PTPRN2 “at risk” (alive). (c) Analysis of disease-free survival (DFS) in PDA patients expressing high (upper 2/3) and low (lower 1/3) levels of SLC12A8. (d) Two A→C mutations in a regulatory site on chromosome 3 at positions 124,840,671 and 124,840,678 alter critical nucleotides in an IRF1 and/or PRDM1 binding site. The regulatory site lies in an intron of one isoform and promoter of an alternative isoform of SLC12A8. At the bottom, heat map displays predicted change in accessibility, considered here as DNase-seq signal in GM12865. The line plots above measure the maximum (gain) and minimum (loss) predicted change; the loss highlights nucleotides that significantly alter the overall signal upon mutation as both of these mutations do.

Mechanisms of NCM-modulated expression

To uncover mechanisms by which expression-correlated SNPs may influence transcription, we annotated mutations with their predicted influence on local DNase hypersensitivity using the software Basset[44] (see Online Methods). The predicted influences of these 55 SNPs were significantly greater in magnitude after Bonferroni correction than a null model of sampling from the full set in 160 out of 164 examined cell types. For example, two different mutations in IRF1 and PRDM1 motifs altered critical positions that likely debilitate binding within an intron of SLC12A8 (Fig. 4d). Additional mutations modulate an NRF1 motif in the promoter of SNRPN and a GATA motif adjacent to a PU.1 binding site in an intron of LSAMP (Supplementary Fig. 4). Therefore, GECCO enriches for NCMs with predicted effects on DNase hypersensitivity and transcription factor binding. While the Basset analysis identified NCMs predicted to affect DNase hypersensitivity, we sought to uncover NCMs directly modulating gene expression. To determine the functional relevance of specific NCMs, we performed luciferase reporter assays in non-transformed HEK-293 cells and the MiaPaCa2 and Suit2 PDA cell lines, comparing gene expression driven by wild type (WT) and mutant (MUT) sequences (Fig. 5). Among 11 regions tested, 7 (293) and 4 (MiaPaCa2, Suit2) mutations significantly altered luciferase expression. Importantly, NCMs associated with PTPRN2, PDPN, TUSC7, SNRNP and MTERF4 significantly decreased luciferase expression in one or multiple cell lines, consistent with decreased expression of these genes associated with NCMs in patient samples (Fig. 4a). Our validation rate was greater or comparable in terms of hit rate, and greater in terms of fold change, than other recent attempts to identify NCMs driving differential expression[15,16], highlighting the power of GECCO to identify functionally significant NCMs from millions of candidate mutations.

Figure 5

- Noncoding mutations modulate luciferase gene expression

(a-c) Luciferase reporter assays of WT (black) and MUT sequences (white bars) are shown for selected NCMs associated with named genes. For each box-and-whisker plot, center line is the mean, box limits are min/max values, whiskers are s.d. Data from a representative experiment (n=3 replicates) with a total of n=4 independent transfected cultures for each cell line are shown. P values calculated by two-tailed unpaired t test. (*, p<0.05; **, p<0.01; ***, p<0.001)

Mutational and expression patterns of CRR classes

The second module of GECCO focuses on CRR classes, rather than individual genes, to identify mutational patterns and overall effects on gene expression of each CRR class (Figure 6). We computed the mutation rate for each CRR class correcting for element size and abundance in the genome. We found no significant effect of GC content on CRR class mutation rate. Noncoding mutations were specifically enriched in certain classes of gene-proximal CRRs (see Supplementary Note). Next, we sought to understand the molecular characteristics of each CRR class in terms of effect on gene expression. We calculated an expression modulation score (EMS) for each CRR class reflecting the impact of the presence of that CRR on the expression of the neighboring gene in relation to all other genes. This method compared, for each CRR class, mean expression of genes proximal to a CRR to those that are non-proximal. CRRs with strong predicted activating or repressing activity would be proximal to genes with expression levels substantially higher (for activators) or substantially lower (for repressors) than the basal genome expression level (Supplementary Table 4, see Online Methods). To determine if the strongest activators and repressors were enriched for those CRRs with the highest mutational frequencies, we considered any activator or repressor that was greater than 1 standard deviation from the mean EMS (12 activators, 9 repressors) (Fig. 6, green and orange bars). The mutational frequencies for each group (activators, repressors, all others with balanced expression) were then calculated and activators and repressors compared to the balanced group (p=0.02077 for activators vs. balanced; p=0.04982 for repressors vs. balanced). The CRR classes with the highest percentage of mutations across all PDA patients were enriched on either end of the spectrum (most repressive or most active), suggesting that recurrent NCMs are preferentially located in CRR classes with the strongest impact on gene expression. These highly active CRR classes have the largest effect on gene expression and may, therefore, confer a selective advantage to the cell. In addition, we noted that the 6 genes identified from the shRNA survival screen (Fig. 3a) were all associated with NCMs in highly repressive CRRs. In contrast, every gene that failed to score in the shRNA survival screen was associated with highly active CRRs (Fig. 3a).

Figure 6

Gene-proximal NCMs are enriched in specific classes of CRRs

Percentage of CRRs with at least 2 mutations across the patient cohort, corrected for genome abundance and size, ordered from left to right by expression modulation score (EMS) (most repressive to most active). Dotted line represents mean mutation frequency across all CRRs.

Pathway dynamics between activating and repressing CRRs

Next, we investigated the patterns of noncoding SUZ12 mutations in our patient cohort, as SUZ12 had the highest repressive score and SUZ12 sites were frequently mutated (Supplementary Table 4, Fig. 6). We generated two distinct lists of SUZ12-associated genes. The first list contained those genes associated with recurrently mutated SUZ12 sites. The second list contained those genes associated with SUZ12 sites that never harbored recurrent NCMs. We then performed pathway analysis on each gene set to identify differences in biological functions (Fig. 7a). We found that genes without recurrent SUZ12 mutations were enriched in glycoproteins, intracellular signaling as well as the axon guidance/neuron differentiation pathway. In contrast, genes with recurrent SUZ12 mutations were more significantly enriched in homeobox genes, transcription factors, Wnt signaling, proto-oncogenes and the axon guidance/neuron differentiation pathway. Surprisingly, several categories, including glycoproteins, intracellular signaling and extracellular matrix, were completely absent within the mutant SUZ12 gene set. Therefore, there is specificity for the location of NCMs in PDA, not only for certain CRRs, but also for the corresponding cancer-associated genes and pathways.

Figure 7

Gene-proximal NCMs in repressors and activators cluster near distinct subsets of genes

(a) Pathway analysis of genes associated with recurrently mutated repressive (SUZ12, CTBP2, SETDB1) sites (red bars), versus those never harboring NCMs in those CRRs (blue bars). (b) Pathway analysis of genes associated with recurrently mutated activator (KAT2A, BCLAF1, TAF7, WRNIP1) sites (red bars), versus those never harboring NCMs in those CRRs (blue bars). AG/ND, axon guidance/neuron differentiation.

To further characterize pathways downstream of commonly mutated repressive CRRs, we performed pathway analysis on genes with and without associated CTBP2 mutations (Fig. 7a). Genes without CTBP2 noncoding mutations showed a similar pattern of pathway regulation as SUZ12. These pathways were markedly enriched in the gene set associated with CTBP2 mutations, while alternative splicing and glycoproteins were completely absent. We extended this analysis to another repressive CRR with a high mutational frequency, SETDB1 (Fig. 6a). Genes associated with recurrent NCMs in SETDB1 binding sites were enriched in axon guidance/neuron differentiation, cell adhesion and disease mutation pathways. Therefore, mutations in highly repressive CRRs are enriched in PDA and selectively associated with genes regulating a core set of biological processes. We performed a similar analysis for the commonly mutated activator CRRs, including KAT2A, BCLAF1, TAF7 and WRNIP1 (Fig. 7b) and again found specificity for the genes and pathways that are commonly mutated. For all CRRs, there were significant differences in the pathways regulated by genes with or without mutations in a given CRR. KAT2A, BCLAF1 and TAF7 shared a very similar pattern of pathway regulation, with significant increases in nucleosome assembly/organization, methylation and ubiquitin conjugation, all processes involved in chromatin dynamics. This suggests that genes associated with NCMs in transcriptional repressors regulate homeobox genes and PDA-associated pathways, while genes associated with NCMs in transcriptional activators may regulate transcriptional dynamics through modulation of chromatin states.

DISCUSSION

We developed a new computational method, GECCO, to systematically analyze the noncoding genome of PDA to uncover recurrent regulatory somatic mutations. We find patterns of NCMs associated with genes regulating canonical PDA pathways, but not associated with commonly mutated PDA genes. Therefore, NCMs may serve as a novel mechanism in cancer cells for regulating pathways critical for tumorigenesis. Furthermore, GECCO uncovers mutations correlated with changes in gene expression, including several known tumor suppressors and aberrantly methylated genes. GECCO produces a set of high confidence calls that enrich for predicted effects on DNase hypersensitivity and transcription factor binding, as well as functional effects on gene expression, as experimentally demonstrated by luciferase reporter assays. We find enrichment for NCMs in specific CRRs and distinct subsets of pathways associated with NCMs in highly repressive and transcriptionally active CRRs as identified by our EMS algorithm. To our knowledge, this is the first comprehensive analysis of noncoding alterations in PDA, providing novel insights into PDA pathogenesis and serving as a counterpart to the information gleaned from large-scale exome sequencing projects[2,3]. Mutational analysis of patient tumors is increasingly informing treatment decisions, whereas complimentary techniques, including microarray, RNA sequencing, fluorescence in situ hybridization and immunohistochemistry are required to analyze changes in gene or protein expression of cancer drivers that lack coding mutations. As somatic mutations in DNA regulatory elements can alter gene expression of cancer drivers, targeted or whole genome sequencing may provide clinically useful information for these patients, both in terms of therapeutic decisions and clinical prognosis. Our analysis provides the first collection of NCMs that correlate with changes in gene expression in PDA. Furthermore, we uncover clinical outcome relationships for PTPRN2 and SLC12A8, neither of which has previously been implicated in PDA. Functional validation of NCM-gene expression associations is a critical step in evaluating the robustness of an analysis pipeline. Our luciferase reporter assay experiments demonstrated that GECCO has a higher validation rate in cancer cell lines than any recent study of NCMs[15,16]. Furthermore, the validation rate in HEK293 cells, a standard cell line for luciferase assays, was 64%, concordant with the expected false discovery rate. Finally, GECCO accurately predicted the directionality of gene expression changes associated with NCMs. NCMs associated with PTPRN2, PDPN, TUSC7, SNRNP and MTERF4 significantly decreased luciferase expression in one or multiple cells lines, consistent with decreased gene expression of these genes associated with NCMs in patient samples. This is in contrast to a recent report where the directionality of gene expression changes in the luciferase assay was not consistent with the predicted response[16]. Therefore, GECCO represents a significant improvement in the ability to identify functionally relevant NCMs. Pathway analysis of the gene lists generated by GECCO revealed several unexpected findings. Strikingly, we found that the most highly recurrent somatic NCMs were located near genes in known PDA-associated pathways, including axon guidance, cell adhesion and Wnt signaling, but not the most commonly mutated PDA genes. This suggests that NCMs may drive tumor progression through modulation of PDA-specific pathways, providing an alternative route for pathway activation and a novel mechanism of tumorigenesis. Furthermore, we provide evidence that NCMs in specific regulatory element classes are selected for during tumor evolution. These highly mutated regulatory element classes are predominantly those with the greatest impact on gene expression. Therefore, clusters of NCMs are enriched in gene-proximal regions with the greatest regulatory impact, again providing evidence for selection during tumorigenesis. Pathway analysis of genes near NCMs within these highly mutated regulatory regions shows selectivity for PDA pathways. These pathways are not enriched when analyzing genes without associated clusters of NCMs, again arguing in favor of selection. Interestingly, many transcriptional regulators bind selectively to different regions of the genome in malignant versus non-neoplastc cells[45]. We propose that NCMs found within promoters of PDA pathway genes modify regulatory factor binding to alter gene transcription, thereby providing an additional mechanism to promote cancer.

ONLINE METHODS

1. Data Acquisition

All data used in this analysis were downloaded from the International Cancer Genome Consortium (IGCG) data portal (https://dcc.icgc.org/projects). At our last date of access (Feb 11, 2015), simple somatic mutations (SSM) for 405 pancreatic ductal adenocarcinoma samples were available from the Australian (PACA-AU) and Canadian (PACA-CA) groups. We download the clinical data, SSMs, and when available, sequence-based gene expression (EXP-S) data for all 405 patients.

2. Pre-processing

The whole genome sequencing (WGS) required to call SNVs across all 405 patients and the whole genome RNA-sequencing required to calculate gene expression were carried out by two distinct consortiums, one Canadian and one Australian. All SNV calls (SSMs) and gene expression calculations (EXP-S) by these two groups were consolidated by ICGC.

2.1. SNV calls from whole genome sequencing

For each of the 405 patients we extracted the chromosome, start location, end location, somatic allele, and mutated allele from the list of simple somatic mutations (file: ssm_open.tsv) and converted to bed format. Many of the SNVs were redundant within patients. For each patient, the list of SNVs were sorted by genomic coordinates and consolidated to contain only a single entry for each unique SNV. A subset of patients had extremely low numbers of SNVs (likely due to poor sequencing results) or high numbers of SNVs (likely due to hyper-mutated regions, unlocalized replication defects, or microsatellite instability). Across all 405 patients the number of unique SNVs ranged from 1 to 440,471 with a mean 7,937 and a standard deviation of 26,224. In order to remove outliers we eliminated all patients with less than 100 SNVs (92 patients in total) or an SNV count more than 3 standard deviations away from the mean (5 patients in total). This left 308 patients with a mean SNV count of 7,300 and ranging from 1,040 to 68,885.

2.2. Gene expression (FPKM) from whole genome RNA-sequencing

Of the 308 patients that passed the previous filtering step, 96 had expression data available from ICGC. For each of the 96 patients, we extracted the normalized read count (FPKM) and Ensembl gene id (file: exp_seq.tsv). While the vast majority of genes have expression data across all 96 patients, there were several thousand Ensembl genes that only contained expression data for a subset of patients. In order to streamline and simplify downstream analysis we kept only the 50,861 Ensembl genes that were shared by all 96 patients. In addition, there were three patients (DO33168, DO35098, DO35100) that had gene expression from either 2 or 3 independently sequenced samples. For these three patients, the gene expression for each gene was calculated by taking the mean across all samples.

3. Analyzing noncoding variants with GECCO

In order to identify potential noncoding cancer drivers, we first used FunSeq2 (v2.1.0) as a high level filter to prioritize our SNVs. The unique SNVs for each of the 308 patients were converted to bed format and analyzed by FunSeq2 using the command ./run.sh –inf bed –n to identify only noncoding variants. This analysis pipeline requires a suite of annotation data that is used to make calls and score noncoding variants. These were downloaded from (http://funseq2.gersteinlab.org/data/). One of these files, “ENCODE.annotation.gz” contains the full list of TFPs/CRRs used in our analysis along with their exact genomic coordinates.

3.1 Processing recurrently mutated cis-regulatory regions (CRRs)

FunSeq2 generates a number of output files including Recur. Summary, which contains a list of all noncoding elements, the genomic coordinates of these elements, the fraction of patients with a mutation in this element, and the full list of patient names along with the genomic locations of each mutation. While the ENCODE annotation data provides a number of different noncoding elements (enhancers, transcription factor binding sites (TFPs), DNase hypersensitivity, etc.) we chose to focus our analysis on TFPs – referred to in this manuscript as CRRs – as they were the most highly represented class of elements identified. CRR proximal genes were found by intersecting CRRs with genes that had been expanded by 2kb at their 5’ and 3’ ends.

3.2 Calculating CRR mutation rates

As described above, the full list of CRRs (121 distinct CRR classes in total) including their counts and genomic positions can be found in “ENCODE.annotation.gz.” GECCO makes two separate calculations across all 121 CRR classes using the CRR genomic information: (1) For a given CRR class, it calculates the fraction of distinct CRR sites that are mutated within the class and (2) the base level mutation rate for each CRR class (the number of mutations in all CRRs of a given class divided by the total number of base pairs of all CRRs in a given class). For an individual CRR, there are three ways in which GECCO calculates the mutational frequency: (1) by summing the number of mutations in a given CRR, (2) by calculating the fraction of bases in the CRR that are mutated (i.e. mutation counts normalized by read length), or (3) by calculating the fraction of bases in a CRR mutation cluster. Option (3) is computed by first determining the cluster size within a CRR, the number of bases required to span all mutations in a given CRR. For example, consider a 2kb CRR with 9 mutations. If the two most distantly separated of the 9 mutations are 100bps apart then the length of the mutation cluster is 100bp. The mutational frequency of the cluster is then computed by dividing the number of mutations in that cluster by the size of the cluster (9/100 = 9.0%). This approach weights exactly recurrent or proximal mutations more strongly than distant mutations.

4. Pathway analysis

The Database for Annotation, Visualization and Integrated Discovery (DAVID), a functional annotation enrichment algorithm for large-scale biological datasets was used for pathway analysis, with the following annotation categories: SP_PIR_KEYWORDS, GOTERM_BP_FAT, KEGG_PATHWAY, PANTHER_PATHWAY, SMART. A Bonferroni corrected p-value of 0.05 was used as a cutoff for enrichment significance.

5. Survival analysis

Median survival was estimated using the Kaplan-Meier method and the difference was tested using the log-rank Test. P values of less than 0.05 were considered statistically significant. Clinico-pathologic variables analyzed with a P value of less than 0.25 on log-rank test were entered into Cox Proportional Hazard multivariate analysis, and redundant variables were eliminated using a backward elimination method. Statistical analysis was performed using StatView 5.0 Software (Abacus Systems, Berkeley, CA, USA). Overall survival (OS) or disease-free survival (DFS) was used as the primary endpoint. PTPRN2 Expression level > 4.98 defined as high SLC12A8 Expression level > 7.03 defined as high

6. Computing differential expression

Differential expression was computed for each recurrently mutated CRR that was within 2kb of an Ensemble gene using permutation testing. For each CRR/gene pair, the 96 patients with mutation data were split into two groups – patients with mutations in the CRR and patients without mutations in the CRR. Using the expression data downloaded from ICGC for the gene of interest a t-test is performed to generate a single t-value, the observed t-value. The expression values for patients with mutations in CRRs and the expression values for patients without mutations are then permuted 100,000 times to generate 100,000 additional t-values, the permuted t-values. These t-values generally fit a Gaussian distribution to which the observed t-value is then compared to using a two-tailed test. The empirical p-value is computed as the fraction of times (x/100,000) that a “permuted t-value” falls further outside the Gaussian distribution than the “observed t-value”. Once p-values have been calculated for all recurrently mutated genes proximal to CRRs, GECCO estimate q-values (the false discovery rate) for each call. This is done using the “qvalue” package in R and measures the proportion of false positives incurred given the p-value distribution.

7. Luciferase Reporter Assay and Statistics

150 base pair sequences surrounding specific NCMs (wild type, WT or mutant, MUT) were synthesized (Integrated DNA Technologies) and cloned into pGL4.23 (Promega), containing a minimal promoter driving firefly luciferase. Five thousand cells per well (HEK-293, MiaPaCa2 or Suit2) were co-transfected in 96-well format with the specific WT or MUT vector and pRL-SV40P (Renilla luciferase, Addgene #27163) as a normalization control. Luciferase activity was measured 48 hours post-transfection with the Dual-Luciferase Reporter Assay System (Promega). Values reported are firefly luciferase divided by Renilla luciferase. Analytical statistics were generated in Prism 7.0 (GraphPad), and P values are from two-tailed unpaired t tests. All cell lines were obtained from ATCC and tested for mycoplasma contamination.

8. Computing Expression Modulation Scores (EMS)

Some CRRs bind transcription factors or transcription factor components with well-known expression modulation including SUZ12 and CTBP2, which act as transcriptional repressors, or BDP1 and BRF1, which act as transcriptional activators. However, many of the 121 CRRs used in this study have unexplored or unvalidated directions of expression modulation. We developed a method to infer the direction and effect of expression modulation for each CRR class by comparing the expression of genes proximal CRRs in a given CRR class to the mean expression of all other active genes in the genome. Many genes are inactive in any given tissue and in a given RNA-seq experiment ~50% of genes show low to no expression. For all 96 patients with expression data, we found this also to be true with ~50% of genes showing 0 expression. When computing the expression modulation for each CRR class we ignored all genes that showed 0 expression in at least 90% of patients (86 patients or more). For a given CRR class and for each of the 96 patients we compute (1) the mean expression of all genes proximal to CRRs in that class and (2) the mean expression of all genes non-proximal to a CRR in that class. For a given CRR class we then compute the log of the ratio between (1) and (2) for each of the 96 patients and then take the mean of the log ratio for all 96 patients to get a single “expression modulation score” for each CRR class. The log of the ratio will be negative if the mean expression of genes proximal to a CRR class is lower than the genome average (repression) and will be positive if the mean expression of genes proximal to a CRR class is higher than the genome average (activation). This calculation is not meant to generate absolute numerical score for the repressive or activating activity of a CRR but is instead used to generate a rank-sorted list of CRR classes based on their expression modulation.

9. Basset Analysis

Basset is a recently introduced method based on convolutional neural networks to accurately predict DHSs from DNA sequence, thus enabling annotation of the influence of mutations on accessibility[44]. We trained the Basset deep convolutional neural network on DHSs from 164 cell types mapped by ENCODE and the Roadmap Epigenomics projects. From this, we predicted the influence of variants on the presence of DNase hypersensitivity in each cell type by computing the difference between predictions on sequences with each allele. Candidate high impact variants were further analyzed for interrupting known binding sites by converted Basset-learned first convolution layer filters to probabilistic position weight matrixes by counting nucleotide occurrences in the set of sequences that activate the filter to a value that is more than half of its maximum value. We identified the likely binding protein for the motifs by querying the CIS-BP database[46] (accessed on June 12, 2015) using the TomTom v4.10.1 search tool[47] and requiring an FDR q-value < 0.1. Supplementary Figure 1 – Identification of recurrent noncoding mutations in PDA. Distribution of SNV rates across the patient cohort. Supplementary Figure 2 – Overlap of SNVs and common coding mutations in PDA. Distribution of SNVs across the patient cohort, with common coding mutations (colored bars) in PDA genes. Supplementary Figure 3 – Overlap of gene-proximal NCMs in CRRs and common coding mutations in PDA. Distribution of CRR mutation rates across the patient cohort, with common coding mutations (colored bars) in PDA genes. Supplementary Figure 4 – NCMs disrupt transcription factor binding motifs. (a) A G→A mutation in a regulatory site on chromosome 15 at position 25,200,056 alters a critical nucleotide in an NRF1 binding site. The regulatory site lies in the promoter of SNRPN. At the bottom, the heat map displays the predicted change in binding, considered here as ChIP-seq signal for NRF1 in H1-hESCs. The line plots above measure the maximum (gain) and minimum (loss) predicted change; the loss highlights nucleotides that significantly alter the overall signal upon mutation as this mutation does. (b) A G->T mutation in a regulatory site on chromosome 3 at position 115,757,580 introduces a GATA factor binding site nearby an established PU.1 binding site. The heat map displays the predicted change in accessibility, considered here as DNase-seq signal in K562. In other cells, such as monocytes, the model predicts reduced accessibility, suggesting that GATA binding here may alter the combinatorial logic of the regulatory element in a complex fashion. Supplementary Table 1 – Genome-wide exactly recurrent mutations in PDA. The most common exactly recurrent mutations across the patient cohort. Sequence of mutant allele in parenthesis. Supplementary Table 2 – Distribution of gene-proximal NCMs near known PDA genes. Analysis of the association of NCM clusters as determined by GECCO with known PDA genes. Supplementary Table 3 – PTPRN2 multivariate analysis. Multivariate analysis of clinico-pathological variables and PTPRN2 expression in the patient cohort. Supplementary Table 4 – CRR expression modulation scores. Effect of CRR on activity of neighboring gene compared with all other genes in the genome (see Online Methods for analysis details). EM Score, expression modulation score.

47 in total

1. JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Authors: Albin Sandelin; Wynand Alkema; Pär Engström; Wyeth W Wasserman; Boris Lenhard
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. RUNX3 Controls a Metastatic Switch in Pancreatic Ductal Adenocarcinoma.

Authors: Martin C Whittle; Kamel Izeradjene; P Geetha Rani; Libing Feng; Markus A Carlson; Kathleen E DelGiorno; Laura D Wood; Michael Goggins; Ralph H Hruban; Amy E Chang; Philamer Calses; Shelley M Thorsen; Sunil R Hingorani
Journal: Cell Date: 2015-05-21 Impact factor: 41.582

3. Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element.

Authors: Marc R Mansour; Brian J Abraham; Lars Anders; Alla Berezovskaya; Alejandro Gutierrez; Adam D Durbin; Julia Etchin; Lee Lawton; Stephen E Sallan; Lewis B Silverman; Mignon L Loh; Stephen P Hunger; Takaomi Sanda; Richard A Young; A Thomas Look
Journal: Science Date: 2014-11-13 Impact factor: 47.728

4. Lhx6 and Lhx8 coordinately induce neuronal expression of Shh that controls the generation of interneuron progenitors.

Authors: Pierre Flandin; Yangu Zhao; Daniel Vogt; Juhee Jeong; Jason Long; Gregory Potter; Heiner Westphal; John L R Rubenstein
Journal: Neuron Date: 2011-06-09 Impact factor: 17.173

5. International network of cancer genome projects.

Authors: Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal: Nature Date: 2010-04-15 Impact factor: 49.962

6. MiR-132, miR-15a and miR-16 synergistically inhibit pituitary tumor cell proliferation, invasion and migration by targeting Sox5.

Authors: Wang Renjie; Liang Haiqian
Journal: Cancer Lett Date: 2014-10-08 Impact factor: 8.679

7. Recurrent somatic mutations in regulatory regions of human cancer genomes.

Authors: Collin Melton; Jason A Reuter; Damek V Spacek; Michael Snyder
Journal: Nat Genet Date: 2015-06-08 Impact factor: 38.330

8. Genome-wide analysis of noncoding regulatory mutations in cancer.

Authors: Nils Weinhold; Anders Jacobsen; Nikolaus Schultz; Chris Sander; William Lee
Journal: Nat Genet Date: 2014-09-28 Impact factor: 38.330

9. Identification of a panel of sensitive and specific DNA methylation markers for squamous cell lung cancer.

Authors: Paul P Anglim; Janice S Galler; Michael N Koss; Jeffrey A Hagen; Sally Turla; Mihaela Campan; Daniel J Weisenberger; Peter W Laird; Kimberly D Siegmund; Ite A Laird-Offringa
Journal: Mol Cancer Date: 2008-07-10 Impact factor: 27.401

10. WASF3 regulates miR-200 inactivation by ZEB1 through suppression of KISS1 leading to increased invasiveness in breast cancer cells.

Authors: Y Teng; Y Mei; L Hawthorn; J K Cowell
Journal: Oncogene Date: 2013-01-14 Impact factor: 9.867

28 in total

1. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk.

Authors: Jian Zhou; Christopher Y Park; Chandra L Theesfeld; Aaron K Wong; Yuan Yuan; Claudia Scheckel; John J Fak; Julien Funk; Kevin Yao; Yoko Tajima; Alan Packer; Robert B Darnell; Olga G Troyanskaya
Journal: Nat Genet Date: 2019-05-27 Impact factor: 38.330

Review 2. The pancreatic cancer genome revisited.

Authors: Akimasa Hayashi; Jungeui Hong; Christine A Iacobuzio-Donahue
Journal: Nat Rev Gastroenterol Hepatol Date: 2021-06-04 Impact factor: 46.802

3. Differential Allele-Specific Expression Uncovers Breast Cancer Genes Dysregulated by Cis Noncoding Mutations.

Authors: Pawel F Przytycki; Mona Singh
Journal: Cell Syst Date: 2020-02-19 Impact factor: 10.304

4. Dissecting the sources of gene expression variation in a pan-cancer analysis identifies novel regulatory mutations.

Authors: Anchal Sharma; Chuan Jiang; Subhajyoti De
Journal: Nucleic Acids Res Date: 2018-05-18 Impact factor: 16.971

Review 5. Personal Mutanomes Meet Modern Oncology Drug Discovery and Precision Health.

Authors: Feixiong Cheng; Han Liang; Atul J Butte; Charis Eng; Ruth Nussinov
Journal: Pharmacol Rev Date: 2018-12-13 Impact factor: 25.468

6. Identification of recurrent noncoding mutations in B-cell lymphoma using capture Hi-C.

Authors: Alex J Cornish; Phuc H Hoang; Sara E Dobbins; Philip J Law; Daniel Chubb; Giulia Orlando; Richard S Houlston
Journal: Blood Adv Date: 2019-01-08

7. IW-Scoring: an Integrative Weighted Scoring framework for annotating and prioritizing genetic variations in the noncoding genome.

Authors: Jun Wang; Abu Z Dayem Ullah; Claude Chelala
Journal: Nucleic Acids Res Date: 2018-05-04 Impact factor: 16.971

8. Integration of Genomic and Transcriptional Features in Pancreatic Cancer Reveals Increased Cell Cycle Progression in Metastases.

Authors: Ashton A Connor; Robert E Denroche; Gun Ho Jang; Mathieu Lemire; Amy Zhang; Michelle Chan-Seng-Yue; Gavin Wilson; Robert C Grant; Daniele Merico; Ilinca Lungu; John M S Bartlett; Dianne Chadwick; Sheng-Ben Liang; Jenna Eagles; Faridah Mbabaali; Jessica K Miller; Paul Krzyzanowski; Heather Armstrong; Xuemei Luo; Lars G T Jorgensen; Joan M Romero; Prashant Bavi; Sandra E Fischer; Stefano Serra; Sara Hafezi-Bakhtiari; Derin Caglar; Michael H A Roehrl; Sean Cleary; Michael A Hollingsworth; Gloria M Petersen; Sarah Thayer; Calvin H L Law; Sulaiman Nanji; Talia Golan; Alyssa L Smith; Ayelet Borgida; Anna Dodd; David Hedley; Bradly G Wouters; Grainne M O'Kane; Julie M Wilson; George Zogopoulos; Faiyaz Notta; Jennifer J Knox; Steven Gallinger
Journal: Cancer Cell Date: 2019-01-24 Impact factor: 31.743

Review 9. Pancreatic Cancer: Molecular Characterization, Clonal Evolution and Cancer Stem Cells.

Authors: Elvira Pelosi; Germana Castelli; Ugo Testa
Journal: Biomedicines Date: 2017-11-18

10. Solute carrier family 12 member 8 impacts the biological behaviors of breast carcinoma cells by activating TLR/NLR signaling pathway.

Authors: LinWei Li; Jing Xia; RuTing Cui; Bin Kong
Journal: Cytotechnology Date: 2020-11-11 Impact factor: 2.058