| Literature DB >> 31360895 |
Rebecca C Poulos1,2, Dilmi Perera1, Deborah Packham1, Anushi Shah1, Caroline Janitz3, John E Pimanda1,4,5, Nicholas Hawkins5,6, Robyn L Ward1,7, Luke B Hesson1, Jason W H Wong1,8.
Abstract
BACKGROUND: Genetic testing of cancer samples primarily focuses on protein-coding regions, despite most mutations arising in noncoding DNA. Noncoding mutations can be pathogenic if they disrupt gene regulation, but the benefits of assessing promoter mutations in driver genes by panel testing has not yet been established. This is especially the case in colorectal cancer, for which few putative driver variants at regulatory elements have been reported.Entities:
Year: 2019 PMID: 31360895 PMCID: PMC6649856 DOI: 10.1093/jncics/pkz012
Source DB: PubMed Journal: JNCI Cancer Spectr ISSN: 2515-5091
Figure 1.Sequence coverage by target capture sequencing (TCS) and mutation characteristics. A) Types and sizes of regions sequenced by TCS. B) Average per sample read coverage across sequenced bases in cancer and matched normal TCS samples. Read coverage is plotted for each region type. Dotted lines indicate average read coverage in cancer and matched normal samples across the cohort. C) Somatic single-nucleotide mutation rate per megabase (Mb) of each sample in the TCS cohort (n = 95), plotted on a log scale (y-axis). Colors represent colorectal cancer subtypes as indicated, and somatic single-nucleotide and indel mutations in colorectal cancer driver genes of interest are marked by bars. POLE Exonuc = Polymerase Epsilon exonuclease domain mutation; trunc = frameshift or stop-gain mutation; ns = nonsynonymous mutation. D) Numbers of indels identified in microsatellite unstable (MSI) and microsatellite stable (MSS) colorectal cancer samples sequenced by TCS. Individual samples are indicated by dots corresponding to the number of indels identified by at least two variant detectors. E) Average somatic single-nucleotide mutation rate (mutations per Mb) in cancer samples across region types sequenced by TCS. Data from somatic single-nucleotide mutations in The Cancer Genome Atlas (TCGA) whole-exome sequenced (WXS) colorectal cancer samples are shown, as indicated on the rightmost bars on the graph. WXS samples harboring nonsynonymous POLE coding variants are excluded, and coding exons are here derived from GENCODE (35) v29 data. F) Average variant allele frequency (VAF) of somatic single-nucleotide mutations overlapping exons of sequenced driver genes and VAF of all other mutations. In all plots, mid-line and error bars indicate mean and standard deviation, and ****P < .0001.
Summary of driver mutations in genes sequenced, with more than 5% recurrence in the sequenced cohort*
| Gene | MSS | MSI |
| Total |
|---|---|---|---|---|
| (n = 77) | (n = 15) | (n = 3) | (n = 95) | |
| APC | 58 (75.3%) | 5 (33.3%) | 3 (100.0%) | 66 (69.5%) |
| TP53 | 53 (68.8%) | 2 (13.3%) | 2 (66.7%) | 57 (60.0%) |
| KRAS | 22 (28.6%) | 2 (13.3%) | 1 (33.3%) | 25 (26.3%) |
| FBXW7 | 11 (14.3%) | 4 (26.7%) | 2 (66.7%) | 17 (17.9%) |
| PIK3CA | 10 (13.0%) | 2 (13.3%) | 3 (100.0%) | 15 (15.8%) |
| BRAF | 6 (7.8%) | 8 (53.3%) | 0 (0.0%) | 14 (14.7%) |
| TCF7L2 | 7 (9.1%) | 1 (6.7%) | 2 (66.7%) | 10 (10.5%) |
| SMAD4 | 8 (10.4%) | 2 (13.3%) | 0 (0.0%) | 10 (10.5%) |
| RNF43 | 1 (1.3%) | 7 (46.7%) | 1 (33.3%) | 9 (9.5%) |
| SOX9 | 7 (9.1%) | 1 (6.7%) | 0 (0.0%) | 8 (8.4%) |
| ARID1A | 2 (2.6%) | 4 (26.7%) | 2 (66.7%) | 8 (8.4%) |
| PTPRK | 2 (2.6%) | 4 (26.7%) | 1 (33.3%) | 7 (7.4%) |
| PMS1 | 3 (3.9%) | 1 (6.7%) | 1 (33.3%) | 5 (5.3%) |
| MSH6 | 1 (1.3%) | 2 (13.3%) | 2 (66.7%) | 5 (5.3%) |
The number of samples with at least one somatic single-nucleotide or indel mutation in each gene is shown. For a mutation to be considered a driver, it must be either a frameshift or stop-gain mutation, or a missense mutation with a PROVEAN (37) converted rankscore of more than 0.5, a PolyPhen (38) prediction of “deleterious,” or a SIFT (39) prediction of “possibly damaging” or “probably damaging.” The percentage listed indicates the fraction of the subtype or total cohort that harbored at least one mutation fulfilling these criteria. MSI = microsatellite unstable sample; MSS = microsatellite stable sample; POLE mut. = Polymerase epsilon exonuclease domain-mutated MSS sample.
Figure 2.Mutational spectra from target capture sequencing (TCS) samples with notable germline variants. A) Normalized mutational spectrum from colorectal cancer sample CRC_4 (upper) against signature 14 from the Catalogue of Somatic Mutations in Cancer (COSMIC) (14) database (lower). B) Normalized mutational spectrum from colorectal cancer sample CRC_3 (upper) against signature 18 from the COSMIC (14) database (lower).
Figure 3.Search for putative driver variants in target capture sequencing (TCS) data. A) Snapshot from the University of California Santa Cruz (UCSC) Genome Browser, indicating the location of somatic mutations from our TCS and The Cancer Genome Atlas (TCGA) colorectal cancer cohort within the promoter of BMP3. B) Expression of BMP3 (left), APC (middle), and VTI1A (right) in promoter wild-type (wt) and mutant (mt) TCGA colorectal cancer samples, for the respective genes. n.s = not statistically significant by unpaired t test; mean and standard deviation are shown. C) Quantile-quantile plots produced by OncodriveFML (26) showing the expected and observed distribution of P values demonstrating any functional somatic variant bias in coding exons of the colorectal cancer-associated genes sequenced (left) and all sequenced regions, excluding coding exons from sequenced colorectal cancer-associated genes (right). Dots represent different sequenced regions, with lighter colors indicating regions for which the number of mutated samples did not reach the minimum required to perform the multiple testing correction. Sequenced regions identified as statistically significant are indicated for q-value less than 0.1 (red) and q-value less than 0.25 (green). D) Snapshot from UCSC Genome Browser, indicating the location of indels within the putative promoter of MTERFD3. Transcription factor binding data are shown from the ENCODE (36) “Transcription Factor ChIP-seq (161 factors)” track. Grey boxes indicate peak clusters of transcription factor occupancy, where the darkness of each box signifies the maximum signal strength observed in any cell line contributing to that cluster. A green highlight within a box designates the site of the highest scoring canonical motif for the transcription factor indicated, via Factorbook (28) annotations. HCT-116 (human colon cancer cell-line) H3K4me3 chromatin immunoprecipitation sequencing (ChIP-seq) and DNase I hypersensitivity sequencing (DNase-seq) data are also shown.
Figure 4.Validation by Sanger sequencing, and the genomic locus harboring deletions in the MTERFD3 putative promoter. A) Sequencing traces from Sanger sequencing of genomic DNA, depicting validation of the three indels within the MTERFD3 putative promoter. Sequencing traces are visualized using Geneious version 10.2.2 (http://www.geneious.com). B) Snapshot from the University of California Santa Cruz Genome Browser, indicating deletions (indels) within the putative promoter of MTERFD3, alongside chromatin immunoprecipitation sequencing (ChIP-seq) data for the transcription factors with motifs disrupted. Boxes contain the reference DNA sequence, with the deleted nucleotides marked by an orange box. Transcription factor binding motifs are shown from Factorbook (28), where a green bar depicts the span of the motif across the DNA sequence.