Elli-Mari Aska1, Denis Dermadi2, Liisa Kauppi3. 1. Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland. Electronic address: elli.aska@helsinki.fi. 2. Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; Laboratory of Immunology and Vascular Biology, Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305, USA; Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA. Electronic address: ddermadi@stanford.edu. 3. Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland. Electronic address: liisa.kauppi@helsinki.fi.
Abstract
DNA mismatch repair (MMR) corrects replication errors and is recruited by the histone mark H3K36me3, enriched in exons of transcriptionally active genes. To dissect in vivo the mutational landscape shaped by these processes, we employed single-cell exome sequencing on T cells of wild-type and MMR-deficient (Mlh1-/-) mice. Within active genes, we uncovered a spatial bias in MMR efficiency: 3' exons, often H3K36me3-enriched, acquire significantly fewer MMR-dependent mutations compared with 5' exons. Huwe1 and Mcm7 genes, both active during lymphocyte development, stood out as mutational hotspots in MMR-deficient cells, demonstrating their intrinsic vulnerability to replication error in this cell type. Both genes are H3K36me3-enriched, which can explain MMR-mediated elimination of replication errors in wild-type cells. Thus, H3K36me3 can boost MMR in transcriptionally active regions, both locally and globally. This offers an attractive concept of thrifty MMR targeting, where critical genes in each cell type enjoy preferential shielding against de novo mutations.
DNA mismatch repair (MMR) corrects replication errors and is recruited by the histone mark H3K36me3, enriched in exons of transcriptionally active genes. To dissect in vivo the mutational landscape shaped by these processes, we employed single-cell exome sequencing on T cells of wild-type and MMR-deficient (Mlh1-/-) mice. Within active genes, we uncovered a spatial bias in MMR efficiency: 3' exons, often H3K36me3-enriched, acquire significantly fewer MMR-dependent mutations compared with 5' exons. Huwe1 and Mcm7 genes, both active during lymphocyte development, stood out as mutational hotspots in MMR-deficient cells, demonstrating their intrinsic vulnerability to replication error in this cell type. Both genes are H3K36me3-enriched, which can explain MMR-mediated elimination of replication errors in wild-type cells. Thus, H3K36me3 can boost MMR in transcriptionally active regions, both locally and globally. This offers an attractive concept of thrifty MMR targeting, where critical genes in each cell type enjoy preferential shielding against de novo mutations.
Maintaining genomic integrity during DNA replication is crucial for cellular homeostasis, especially in protein-coding regions. Occasionally, DNA replication errors occur, of which most, but not all, are corrected by the intrinsic proofreading activity of DNA polymerases (St Charles et al., 2015). DNA mismatch repair (MMR) corrects base-base mismatches and small insertion-deletion (indel) loops that have escaped proofreading and thereby protects the genome from replication-induced permanent mutations (Li, 2008). MMR initiates when the MSH2/MSH6 (MutSα) or MSH2/MSH3 (MutSβ) complex recognizes and binds DNA lesions, a step followed by recruitment of the MLH1/PMS2 (MutLα) complex that triggers the excision and repair of the mismatch (Lahue et al., 1989; Zhang et al., 2005).MSH6 of MutSα can bind to trimethylated histone H3lysine 36 (H3K36me3) and recruit the MMR machinery to chromatin (Li et al., 2013). H3K36me3 is found in exonic regions and enriched at the 3′ ends of transcribed genes (Kolasinska-Zwierz et al., 2009) and also in constitutive and facultative heterochromatin (Chantalat et al., 2011). Recently, H3K36me3 has been shown to also guide m6A deposition to mRNA (Huang et al., 2019), which is known to affect mRNA stability and translation (Huang et al., 2018a; Wang et al., 2014, 2015). Genome-wide mutational analyses of MMR-deficient cell lines and tumors have shown that presence of H3K36me3 reduces local mutation rate (Supek and Lehner, 2015, 2017). Moreover, in tumors and cell lines, MMR operates more efficiently in H3K36me3-enriched exons compared with introns (Frigola et al., 2017), and in actively transcribed genes compared with silent genes, and lowers the mutation frequency in the 3′ ends of the genes (Huang et al., 2018b). Mutation signature of the error-prone polymerase η, which is part of the somatic hypermutation specific MMR pathway, is targeted to 3′ ends of genes via H3K36me3 in solid tumors (Supek and Lehner, 2017).MMR deficiency has been extensively modeled in Mlh1mice, which display high microsatellite instability (MSI) and increased tumor mortality (Baker et al., 1996; Edelmann et al., 1996, 1999; Prolla et al., 1998). Female Mlh1mice frequently develop lymphomas, mainly thymic, whereas males tend to develop gastrointestinal tumors (Gladbach et al., 2019). MSI occurs owing to the propensity of microsatellites (short tandem repeat sequences) to undergo strand slippage during DNA replication, which in MMR-deficient cells leads to deletion or insertion mutations within repeats. Recently, analysis of genome-wide mutations in Mlh1lymphomas revealed several putative drivers of tumorigenesis (Daino et al., 2019; Gladbach et al., 2019).To delineate how the mutational landscape in normal mammalian cells is shaped in vivo, on one hand, by replication errors, and on the other hand, by H3K36me3-mediated MMR correction, we performed single-cell whole-exome sequencing (scWES) on T cells isolated from MMR-proficient (Mlh1) and MMR-deficient (Mlh1) mice. Comparison of mutation distribution and frequency between MMR-proficient and -deficient mice revealed Huwe1 and Mcm7 genes as mutational hotspots exclusive to Mlh1 cells, implying that these regions present an inherent challenge to faithful DNA replication in T cells. Both hotspots are located in H3K36me3-enriched regions and expressed during T cell development. Analysis of MMR-dependent mutations indicate that H3K36me3-enriched 3′ exons are more protected against transcription-associated replication errors.
Results
Deletions Report on MMR-Dependent Mutations in Single-Cell Exome Sequencing
We isolated naive T cells from thymi of Mlh1 and Mlh1mice, followed by single-cell capture and whole-genome amplification on the Fluidigm C1 system, and then, by whole-exome enrichment and sequencing (Figure 1). Previous studies have utilized single-cell DNA sequencing to study clonality and mutation profiles of humancancers and normal cells (Leung et al., 2017; Wu et al., 2017; Zhang et al., 2019; Pellegrino et al., 2018). To check whether T cells were drawn from a similar cell population in both genotypes, we analyzed the proportions of distinct developmental thymic T cell populations (double negative, double positive, TCR αβ single positive [CD4 or CD8], TCR γδ) (Shah and Zuniga-Pflucker, 2014) by FACS. Cell frequencies of different thymic T cell populations between Mlh1 and Mlh1mice were similar to each other (Figure S1), indicating no defect in normal T cell developmental progression in Mlh1mice, and that T cells analyzed by scWES from Mlh1 and Mlh1mice are drawn from similar thymic T cell populations. In both genotypes, the vast majority of cells were CD4+CD8+ double-positive T cells (67% for Mlh1 and 65% for Mlh1mice, respectively, Figure S1).
Figure 1
Whole-Exome Sequencing of Single T Cells: Experimental Overview
Thymi of Mlh1 and Mlh1 mice were dissected and used for enrichment of naive T cells, followed by single-cell capture, cell lysis, and whole-genome amplification in a Fluidigm C1. Amplified genomes were used for whole-exome sequencing (WES), and sequencing reads were analyzed for genetic variants. Shown is a read pileup and coverage of sample WT1-C26 in a ~5-kb-long region on chromosome 1 that contains three exons of Raph1. In addition to exons (green bar in exome panel), WES also partially covers non-coding regions adjacent to exons (blue bar in exome panel), enabling the comparison of mutation frequency between exonic and non-coding regions.
Whole-Exome Sequencing of Single T Cells: Experimental OverviewThymi of Mlh1 and Mlh1mice were dissected and used for enrichment of naive T cells, followed by single-cell capture, cell lysis, and whole-genome amplification in a Fluidigm C1. Amplified genomes were used for whole-exome sequencing (WES), and sequencing reads were analyzed for genetic variants. Shown is a read pileup and coverage of sample WT1-C26 in a ~5-kb-long region on chromosome 1 that contains three exons of Raph1. In addition to exons (green bar in exome panel), WES also partially covers non-coding regions adjacent to exons (blue bar in exome panel), enabling the comparison of mutation frequency between exonic and non-coding regions.We sequenced 56 single-cell exomes in total, from 28 Mlh1 and 28 Mlh1 T cells, to an average depth of 32X and coverage of 66% at depth ≥1X (Figures S2A and S2B). After excluding samples with low (<50%) coverage, 44 exomes (22 Mlh1 and 22 Mlh1 exomes) were further analyzed for genetic variants. All detected variants with annotations (Related to Transparent Methods sections “Variant calling and filtering” and “Mutation annotation”) are listed in Table S2 titled “Annotated variants in single-cell exomes.” Overall, Mlh1 T cells had increased percentage (odds ratio [OR] = 1.56, 95% confidence interval [CI] = 1.44–1.69, p < 2.2 × 10−16) and frequencies (p = 5.487 × 10−6, Figures 2A and 2B and Table S1) of indels when compared with Mlh1 T cells. Even though MMR deficiency increases also base substitutions (Meier et al., 2018), single nucleotide variant (SNV) frequencies between Mlh1 and Mlh1 did not differ significantly in our dataset (p = 0.127, Figure 2B and Table S1). Analyzing insertions and deletions separately revealed that Mlh1 T cells had significantly higher deletion (p = 8.175 × 10−12) but not insertion frequencies (p = 0.1801) than Mlh1 T cells (Figure 2C and Table S1). Taken together, deletions behaved in a genotype-dependent manner and thus represent MMR-dependent mutations.
Figure 2
Global and Local Mutation Frequencies in Single T cells
(A–C) (A) Mlh1 T cells have an increased amount of indels out of total mutations in the whole exome compared with Mlh1 Tcells (p << 0.0001, Fisher's exact test). Global (B) indel and SNV frequencies and (C) deletion and insertion frequencies, in Mlh1 and Mlh1 T cells.Mlh1 T cells have significantly higher indel, and especially deletion, frequencies than Mlh1 T cells (p << 0.001, two-tailed Mann-Whitney U test). Data in (B) is shown as boxplots together with kernel probability density and individual datapoints, and data in (C) is shown as median and interquartile range together with kernel probability density and individual datapoints. Outlier cells (see Transparent Methods) section ""Outlier cells in single-cell exomes) are marked with red color in (B).
(D) Mutation frequencies in 1-Mb windows across the mouse genome. Mlh1 T cells have multiple high local mutation peaks originating from only a single T cell.
(E) Mcm7 and Huwe1 are mutational hotspots in Mlh1 T cells. Columns are sorted by genotype and cell ID (outliers excluded), rows based on the average mutation frequency. Mlh1 cells have label WT and Mlh1 cells have label KO, biological replicates are marked with 1 and 2. Each cell has a cell identifier that originates from the Fluidigm C1 plate capture site.
Bar plots on the right show proportions of mutation types, locations, and consequences in genes. Left-hand-side columns show positivity or negativity for RNApol2 and H3K36me3 peaks (See also Figure S3).
Global and Local Mutation Frequencies in Single T cells(A–C) (A) Mlh1 T cells have an increased amount of indels out of total mutations in the whole exome compared with Mlh1 Tcells (p << 0.0001, Fisher's exact test). Global (B) indel and SNV frequencies and (C) deletion and insertion frequencies, in Mlh1 and Mlh1 T cells.Mlh1 T cells have significantly higher indel, and especially deletion, frequencies than Mlh1 T cells (p << 0.001, two-tailed Mann-Whitney U test). Data in (B) is shown as boxplots together with kernel probability density and individual datapoints, and data in (C) is shown as median and interquartile range together with kernel probability density and individual datapoints. Outlier cells (see Transparent Methods) section ""Outlier cells in single-cell exomes) are marked with red color in (B).(D) Mutation frequencies in 1-Mb windows across the mouse genome. Mlh1 T cells have multiple high local mutation peaks originating from only a single T cell.(E) Mcm7 and Huwe1 are mutational hotspots in Mlh1 T cells. Columns are sorted by genotype and cell ID (outliers excluded), rows based on the average mutation frequency. Mlh1 cells have label WT and Mlh1 cells have label KO, biological replicates are marked with 1 and 2. Each cell has a cell identifier that originates from the Fluidigm C1 plate capture site.Bar plots on the right show proportions of mutation types, locations, and consequences in genes. Left-hand-side columns show positivity or negativity for RNApol2 and H3K36me3 peaks (See also Figure S3).
Huwe1 and Mcm7 Genes Are Mutational Hotspots in Mlh1 T Cells
Mlh1 cells provide a unique opportunity to reveal which chromosomal regions represent a particular challenge to the fidelity of the replication machinery, as any errors that are introduced will remain uncorrected by MMR. To identify such regions, we analyzed mutation frequencies in 1 Mb windows across single-cell exomes. On a megabase scale, local mutational frequencies were highly heterogeneous. The majority of the high-mutation-frequency peaks originated only from single T cells, and mutational hotspot windows shared between individual cells were sparse (Figure 2D). To establish whether any genes would emerge as MMR-dependent mutational hotspots, we scored all genes for mutations and asked which ones were mutated frequently in Mlh1 T cells (in more than 5 Mlh1 cells). Two genes, Huwe1 and Mcm7, stood out with their high mutational frequencies, exclusive to Mlh1 single-cell exomes (Figure 2E). Huwe1 encodes an E3 ubiquitin ligase, shown to regulate hematopoietic stem cell self-renewal and proliferation, and commitment to the lymphoid lineage (King et al., 2016). Mcm7 encodes a component of the MCM2-7 complex that forms the core of the replicative helicase, responsible for unwinding DNA ahead of the replication fork (Deegan and Diffley, 2016). Both genes are positive for RNA polymerase 2 and H3K36me3 in the mouse thymus and expressed from hematopoietic stem cells all the way to thymic T cells (Figures 2E, S3A, and S3B).We then compared the mutational hotspots in Mlh1 and Mlh1−/− normal T cells (this study) and with those in Mlh1mouselymphomas (Kakinuma et al., 2007; Daino et al., 2019; Gladbach et al., 2019). Only one shared mutational hotspot gene was found: Ttn, a massive gene with 324 exons, was mutated in both Mlh1 and Mlh1 single-cell exomes (Figure 2E), in line with the findings of Daino et al. We did not identify any mutations in Ikzf1, previously reported as a mutational target gene in Mlh1-deficient T cell lymphomas (Daino et al., 2019; Kakinuma et al., 2007).Other identified hotspot genes (Gm7361, Vps13c, Gm37013, Gm38667, Gm38666) were mutated in both Mlh1 and Mlh1 T cells and thus were not specific for Mlh1deficiency. All except Vps13c were negative or inconclusive for the presence of H3K36me3 and RNA polymerase 2, suggesting that these genes are not transcribed in mouse thymus (Figures 2E and S3A). Gm37013, Gm38667, and Gm38666 are predicted genes and they physically overlap with each other on chromosome 18 (Figure S3A), which explains their identical mutational pattern.
Insertions and Deletions Accumulate Differently within Repeats in Mlh1 and Mlh1 T Cells
Next, we analyzed the size distribution of detected indels in single-cell exomes. Mlh1 cells had more 1-nucleotide (nt) insertions than deletions, whereas this difference in Mlh1 T cells was evened out by increased 1-nt deletions (OR = 1.794, 95% CI = 1.531–2.101, p = 1.134 × 10−13, Figure 3A). The same trend for 1-nt insertions as the dominant indel type in Mlh1 cells was observed in bulk T cell DNA samples from the same mice (Figure S4).
Figure 3
Small Deletions Report on MMR-Dependent Mutations in Mouse T cells
(A) Indel length distribution as relative frequencies with Sison and Glaz 95% multinomial confidence intervals in Mlh1 and Mlh1 T cells. Mlh1 and Mlh1 cells have different ratios of 1-nt indels (p << 0.001, two-tailed Fisher's exact test). Indels of length ≥10 bp are binned together. See also Figure S4.
(B and C) (B) Relative and (C) normalized frequencies of deletions in microsatellites (MS) (mono-, di-, and trinucleotide repeats) and in non-microsatellite (random) sequence in single-cell samples.
(D and E) (D) Relative and (E) normalized frequencies of insertions in microsatellites (mono-, di-, and trinucleotide repeats) and in non-microsatellite (random) sequence in single-cell samples. Bar plots are ranked by descending mutation fraction within mononucleotide repeats. Mlh1 cells have a significantly higher deletion frequencies in microsatellites than Mlh1 (p << 0.001, two-tailed Mann-Whitney U test). Mutation frequencies are shown as boxplots. Outliers (see Transparent Methods section "Outlier cells in single-cell samples") are labeled with red in (B) and (D).
Small Deletions Report on MMR-Dependent Mutations in Mouse T cells(A) Indel length distribution as relative frequencies with Sison and Glaz 95% multinomial confidence intervals in Mlh1 and Mlh1 T cells. Mlh1 and Mlh1 cells have different ratios of 1-nt indels (p << 0.001, two-tailed Fisher's exact test). Indels of length ≥10 bp are binned together. See also Figure S4.(B and C) (B) Relative and (C) normalized frequencies of deletions in microsatellites (MS) (mono-, di-, and trinucleotide repeats) and in non-microsatellite (random) sequence in single-cell samples.(D and E) (D) Relative and (E) normalized frequencies of insertions in microsatellites (mono-, di-, and trinucleotide repeats) and in non-microsatellite (random) sequence in single-cell samples. Bar plots are ranked by descending mutation fraction within mononucleotide repeats. Mlh1 cells have a significantly higher deletion frequencies in microsatellites than Mlh1 (p << 0.001, two-tailed Mann-Whitney U test). Mutation frequencies are shown as boxplots. Outliers (see Transparent Methods section "Outlier cells in single-cell samples") are labeled with red in (B) and (D).We then analyzed the sequence context of the detected indels. As expected, most deletions in Mlh1 cells occurred at mononucleotide microsatellites, whereas in Mlh1 cells, most deletions were found in non-microsatellite sequences (Figure 3B). When deletion counts were corrected for the number of base pairs of either microsatellite or non-microsatellite sequences, deletion frequencies were higher in microsatellites than in non-microsatellite sequences, regardless of MMR status (Figure 3C). This underscores the well-documented intrinsic propensity of microsatellites to slippage during replication. As expected, Mlh1 cells had significantly higher deletion frequencies in microsatellite sequences compared with Mlh1 cells (p = 9.505 × 10−13, Figure 3C and Table S1). Insertion frequencies within repeats were more similar between Mlh1 and Mlh1 T cells, occurring especially in mononucleotide repeats (Figure 3D). Mlh1 cells had somewhat higher insertion frequencies in the context of microsatellite sequences when compared with Mlh1 cells (p = 0.039, Figure 3E and Table S1).
Exons Show a Decreased Burden of MMR-Dependent Mutations
Exome sequencing, despite its name, captures not only exons but also exon-adjacent, non-coding regions (3′ and 5′ UTR, promoter, or introns) (Figure 1) (Guo et al., 2012). This enabled us to ask whether de novo mutations accumulate differently in these two functionally distinct genic regions (exonic versus non-coding) in Mlh1 and Mlh1 cells.No significant difference in SNV frequencies or insertions was observed in either exonic or non-coding regions in Mlh1 cells compared with Mlh1 cells (Figures 4A and 4B). In contrast, deletion frequencies increased in Mlh1 cells in non-coding regions compared with Mlh1 cells (p = 9.94 × 10- 5, Figure 4C and Table S1). Exonic deletion frequencies in Mlh1 cells did not differ from those observed in Mlh1 cells (Figure 4C), indicating that, in the absence of functional MMR, the integrity of coding regions is still maintained, likely by purifying selection, as suggested for MMR-deficient tumors by Kim et al., 2013. In conclusion, deletions, which we determined to be MMR-dependent mutations, increased more in non-coding regions adjacent to exons, as compared with exons themselves.
Figure 4
Mlh1 Cells Accumulate Mutations to Non-coding Regions of Genome
(A) SNV, (B) insertion, and (C) deletion frequencies in exonic and non-coding (3′ and 5′ UTRs, promoters, splice sites, introns) regions of the exome in Mlh1 and Mlh1 T cells. Mlh1 T cells have significantly higher frequencies of non-coding deletions (p<<0.001, two-tailed Mann-Whitney U test). Data is shown as boxplots together with kernel probability density and individual datapoints
Mlh1 Cells Accumulate Mutations to Non-coding Regions of Genome(A) SNV, (B) insertion, and (C) deletion frequencies in exonic and non-coding (3′ and 5′ UTRs, promoters, splice sites, introns) regions of the exome in Mlh1 and Mlh1 T cells. Mlh1 T cells have significantly higher frequencies of non-coding deletions (p<<0.001, two-tailed Mann-Whitney U test). Data is shown as boxplots together with kernel probability density and individual datapoints
H3K36me3-Enriched Regions Are Depleted of MMR-Dependent Mutations
Results from large tumor datasets strongly indicate that exons have a decreased mutation burden due to H3K36me3-mediated MMR (Frigola et al., 2017), but evidence of this in normal cells and tissues in vivo is still lacking. To assess whether replication errors in transcribed genes are buffered by MMR by virtue of their H3K36me3 enrichment, we first analyzed H3K36me3 abundance in RNA polymerase 2 (RNApol2)-positive (RNApol2+) and -negative (RNApol2-) genes in thymus using publicly available ChIP-seq data (ENCODE Project Consortium, 2012; Sloan et al., 2016). Presence of RNA polymerase 2 in the promoter region is a strong indicator of transcriptional activity (Barski et al., 2007), and we used it to score genes as either active (RNApol2+) or silent (RNApol2-). H3K36me3 levels in RNApol2+ genes were higher than in RNApol2- and peaked at the centers of the exons in these genes (Figure 5A), confirming that H3K36me3 is associated with transcriptional activity also in mouse thymus. However, not all RNApol2+ genes were positive for H3K36me3. Approximately 65% of RNApol2+ genes were also positive for H3K36me3, whereas 80% of H3K36me3-positive (H3K36me3+) genes were positive for RNApol2 (Figure 5B).
Figure 5
H3K36me3 Reduces the Amount of MMR-Dependent Mutations in Exons
(A) H3K36me3 fold change (FC) (mean ± SD) in 1,000-bp window around exon centers in RNApol2-positive and -negative genes.
(B) Venn diagram of RNApol2-positive (+) and H3K36me3-positive (+) gene counts. Proportions of small deletions in genes positive or negative for H3K36me3 and RNApol2 in (C) Mlh1 and (D) Mlh1 cells. Coding regions in genes positive for H3K36me3 have fewer deletions relative to H3K36me3-negative genes in Mlh1 cells (p = 0.018, OR = 0.44, two-tailed Fisher's exact test), but not in Mlh1 cells. Deletion frequencies in the first to second exons (5′ exons) and the third to last exons (3′ exons) in RNApol2 (E) -positive and (F) -negative genes. In RNApol2-positive genes, Mlh1 cells have higher deletion frequency especially in the third to last exons (high H3K36me3) than Mlh1 cells, and to lesser degree, in the first to second exons (low H3K36me3). The first panel shows the deletion frequencies in Mlh1 and Mlh1 cells. data is shown as median and interquartile range together with kernel probability density and individual datapoints See also Figures S5 and S6. The second panel shows a schematic of H3K36me3 enrichment along a gene. The third panel shows a schematic of a gene structure. The fourth panel shows H3K36me3 signal as mean ± SD of FC in the first to second exons and third to last exons together with effect size as Cohen's d with Bessel's correction. Deletion frequencies were tested using two-tailed Mann-Whitney U test.
H3K36me3 Reduces the Amount of MMR-Dependent Mutations in Exons(A) H3K36me3 fold change (FC) (mean ± SD) in 1,000-bp window around exon centers in RNApol2-positive and -negative genes.(B) Venn diagram of RNApol2-positive (+) and H3K36me3-positive (+) gene counts. Proportions of small deletions in genes positive or negative for H3K36me3 and RNApol2 in (C) Mlh1 and (D) Mlh1 cells. Coding regions in genes positive for H3K36me3 have fewer deletions relative to H3K36me3-negative genes in Mlh1 cells (p = 0.018, OR = 0.44, two-tailed Fisher's exact test), but not in Mlh1 cells. Deletion frequencies in the first to second exons (5′ exons) and the third to last exons (3′ exons) in RNApol2 (E) -positive and (F) -negative genes. In RNApol2-positive genes, Mlh1 cells have higher deletion frequency especially in the third to last exons (high H3K36me3) than Mlh1 cells, and to lesser degree, in the first to second exons (low H3K36me3). The first panel shows the deletion frequencies in Mlh1 and Mlh1 cells. data is shown as median and interquartile range together with kernel probability density and individual datapoints See also Figures S5 and S6. The second panel shows a schematic of H3K36me3 enrichment along a gene. The third panel shows a schematic of a gene structure. The fourth panel shows H3K36me3 signal as mean ± SD of FC in the first to second exons and third to last exons together with effect size as Cohen's d with Bessel's correction. Deletion frequencies were tested using two-tailed Mann-Whitney U test.We analyzed how small deletions (that is, MMR-dependent mutations) were distributed to exons and non-coding regions based on either RNApol2 or H3K36me3 status of genes. The proportion of exonic deletions over non-coding deletions was decreased in H3K36me3+ genes compared with H3K36me3-negative (H3K36me3-) genes in Mlh1 (p = 0.018, OR = 0.44, 95% CI = 0.198–0.906) but not in Mlh1 T cells (p = 1, OR = 0.972, 95% CI = 0.542–1.694, Figures 5C and 5D). Lower exonic deletion burden in RNApol2+ genes was also observed in Mlh1 cells, similar to H3K36me3+ genes (p = 0.062, OR = 0.528, 95% C1 = 0.250–1.060, Figure 5C). The similar trends are not surprising, given the overlap between RNApol2+ and H3K36me3+ genes (Figure 5B). These results strongly support H3K36me3-guided, MMR-dependent protection of exons against genetic alterations.The H3K36me3 mark is less abundant in 5′ exons, compared with 3′ exons of genes (Kolasinska-Zwierz et al., 2009; Frigola et al., 2017). To test whether local H3K36me3 levels affect the intra-genic distribution of mutations within genes in vivo, we compared deletion frequencies in the first and second exons (from here on referred to as 5′ exons) with those in the third to last exons (from here on referred to as 3′ exons), both in RNApol2+ and RNApol2- genes. In RNApol2+ genes, H3K36me3 signal increased in 3′ exons compared with 5′ exons (d = 0.335, Figures 5E and S6A), whereas in RNApol2- genes, there was no difference in H3K36me3 levels between 3′ and 5′ exons (d = 0.002, Figures 5F, S6A, and Table S1). In RNApol2+ genes, Mlh1 cells had higher deletion frequencies in 3′ exons (high in H3K36me3) compared with Mlh1 cells (p = 4.57 × 10−5, Figures 5E, S6B, and Table S1). In 5′ exons (low in H3K36me3), the difference in deletion frequencies between Mlh1 and Mlh1 was smaller, yet significant (p = 0.016, Figures 5E, S6B, and Table S1). Mlh1 cells also had somewhat increased deletion frequencies in the 3′ exons compared with 5′ exons (p = 0.020, Figures 5E, S6B, and Table S1). Sequencing coverage was similar between samples with or without mutations in the analyzed exons, except in the 5′ exons in RNApol2+ regions in Mlh1 cells (p = 0.04, Figure S5). Taken together, these results suggest that 3′ exons in transcriptionally active genes are more prone to acquiring replication-induced mutations compared with 5′ exons and that this effect is tempered by H3K36me3-guided MMR. No difference was observed in the deletion frequencies between Mlh1 and Mlh1 cells in RNApol2- genes in 5′ exons (p = 0.539) or 3′ exons (p = 0.296, Figures 5F, S6B, and Table S1). Mlh1 cells, however, showed slightly higher deletion frequencies in 3′ exons compared with 5′ exons (p = 0.049, Figures 5F, S6B, and Table S1). H3K36me3- exons in RNApol2- genes accumulated mutations in similar frequencies in both Mlh1 and Mlh1 cells. We interpret this to mean that the MMR machinery does not operate efficiently in these regions even in wild-type cells. RNApol2+, but not RNApol2-, genes showed genotype-dependent spatial variability in deletion frequencies; thus transcriptional activity appears to affect accumulation and/or repair of replication errors.
Discussion
Using single-cell sequencing of mouse thymic T cells, we uncovered how the exome-wide mutational landscape is shaped in vivo by replication errors, along with MMR-mediated error correction. We identify the Huwe1 and Mcm7 genes as novel mutational hotspots in normal Mlh1 thymic T cells. We further provide evidence for transcription-associated vulnerability to replication errors and for H3K36me3-guided MMR at 3′ exons of genes.We show that scWES is a sensitive approach for unraveling signatures of replication errors and MMR activity. This is highlighted by the fact that we detected a substantial increase of deletions in Mlh1 T cells and found evidence of insertional bias in Mlh1 T cells. DNA polymerases tend to create more deletions than insertions, especially in repeat sequences (Baptiste et al., 2015; Kunkel, 1986; Kim et al., 2013; Lujan et al., 2015; Woerner et al., 2015; Garcia-Diaz and Kunkel, 2006), and in the absence of MMR (which is the situation in Mlh1 cells), one would expect to directly detect replication errors. Indeed, we observed a significant increase of small deletions in Mlh1 cells compared with Mlh1 cells. Taken together, we conclude that deletions reliably report on replication errors that would otherwise be repaired by MMR. In addition, we found that Mlh1 cells had more insertions than deletions. Increase in 1-nt insertions rather than deletions in Mlh1 cells has also been observed at unstable microsatellite loci in other MMR-proficient normal mouse tissues (Shrestha et al., 2019). Our findings are in line with the previously reported bias for MMR to correct deletion loops more efficiently than insertion loops, thereby creating an insertional bias at microsatellite sequences (Baptiste et al., 2013).MMR-deficient cells (Mlh1) accumulate replication-induced errors with every cell division. Developing lymphocytes are particularly susceptible to replication errors because they undergo multiple rounds of proliferative expansions during development and maturation. Comparison of mutational frequencies in Mlh1 versus Mlh1 T cell exomes revealed two hotspots for replication errors, Huwe1 and Mcm7 genes. Mutations in Mcm7 affected both exons and introns, whereas mutations in Huwe1 were found exclusively in introns (Figure 2E). Exonic Mcm7 mutations comprised both synonymous and non-synonymous mutations. Synonymous exonic Mcm7 mutations, although they do not alter amino acid sequence, may still affect Mcm7 splicing regulatory sites or miRNA binding sites or cause changes in mRNA stability or translation efficiency. Intronic mutations may cause splicing defects, resulting in exon skipping or intron retention (Diederichs et al., 2016). A small fraction (1%–2%) of somatic mutations that alter amino acid sequence create neoepitopes that, when presented on the cell membrane, can provoke immune cell attack (Yamamoto et al., 2019). Mlh1-deficient mousecancer cell lines have been shown to produce persistently neoantigens, both in vitro and in vivo (Germano et al., 2017). Neoantigenicity is unlikely, however, for MCM7 and HUWE1 that reside in the nucleus and/or cytosol, and thus, they lack the appropriate cellular localization to function as neoantigens. Because Huwe1 and Mcm7 are vulnerable to replication errors, we propose that over time, in Mlh1-deficient cells, damaging mutations will emerge in these genes, some with potentially tumorigenic effects. Indeed, deleterious mutations in Huwe1 and Mcm7 have been reported in Mlh1-deficient murine T cell lymphomas (Daino et al., 2019). The propensity of Mcm7, coding for an integral component of the replication machinery, to acquire deleterious mutations in MMR-deficient cells (Figure 2E) conceivably can further accelerate the accumulation of replication-associated errors, thereby adding insult to injury.Both Huwe1 and Mcm7 are expressed in the T lymphocyte lineage and required for lymphocyte development. Shielding them from permanent mutations is likely important for cellular homeostasis and normal development, and Huwe1 and Mcm7 were in fact devoid of mutations in Mlh1 T cells. In the face of frequent replication errors, how is efficient targeting of MMR to these regions ensured in wild-type cells? Both Huwe1 and Mcm7 were enriched for H3K36me3 in the mouse thymus, and H3K36me3-mediated MMR has been shown to protect actively transcribed genes (Huang et al., 2018b). Thus, H3K36me3-mediated recruitment of MMR to these genes provides an explanation for efficient error correction in wild-type cells; in the absence of MMR, H3K36me3 no longer has a protective effect.Also, at single-cell resolution, the protective effect of H3K36me3-mediated MMR on active genes appears to hold true more globally. In wild-type cells, coding regions in H3K36me3-enriched genes exhibited lower mutation frequencies, compared with coding regions in H3K36me3-depleted genes. This effect was abolished in MMR-deficient cells. Our results indicate that H3K36me3-mediated MMR preserves the integrity of active genes in normal tissues in vivo, similarly as shown previously for tumors and cell lines (Supek and Lehner, 2015; Frigola et al., 2017; Huang et al., 2018b).Moreover, we provide in vivo evidence that 3′ ends of actively transcribed genes are more prone to replication-associated errors and that more efficient recruitment of MMR via H3K36me3 protects these regions, ensuring that most of these errors do not become permanent mutations. Head-on collisions of the replication and transcription machineries can cause indels and base substitutions and especially increase the deletion burden within 3′ ends (and to a lesser degree 5′ ends) of genes under active transcription (Sankar et al., 2016). In HeLa cells, mutation frequency has been shown to decrease toward the 3′ end of the gene body, as H3K36me3 increases, implying more efficient MMR-mediated repair in these regions (Huang et al., 2018b). SNVs also accumulate more to 3′ UTRs than to 5′ UTRs in aging B lymphocytes (Zhang et al., 2019), in line with the notion that 3′ regions are in fact more prone to mutations. Efficient recruitment of the MMR machinery via H3K36me3 can shield against replication-induced errors specifically in transcribed genes, whose integrity is particularly important.Here, we delineate the mutational landscape of T cells shaped by the status of DNA repair (functional versus impaired), dissected at the single-cell level in the context of H3K36me3. We provide evidence that, in normal thymocytes in vivo, MMR preferentially protects H3K36me3-positive genes and especially 3′ exons transcribed in T cell lineage, against accumulation of de novo mutations, providing an additional layer to the regional dynamics of H3K36me3-guided MMR. In addition, we identify Huwe1 and Mcm7 as novel mutational hotspots in (still phenotypically normal) Mlh1 T cells, both genes which are of importance during T cell development. Taken together, our results suggest an attractive concept of thrifty MMR targeting, where genes critical for the development of a given cell type and under mutational stress due to active transcription are preferentially shielded from acquiring deleterious mutations.
Limitations of the Study
The number of sequenced single T cells in our study is limited. For a comprehensive view of mutational hotspots and mutation frequencies, more single cells should be sequenced.Owing to limited starting amount of DNA, single-cell genomes were amplified extensively in order to have enough material for sequencing. This amplification introduces in vitro artifacts, which affect the analysis of mutation frequencies and mutational features. Especially genuine de novo SNV mutations are expected to be masked by such artifacts.Our exomic dataset represents less than 2% of the whole mouse genome, and specifically the coding portion where de novo mutations are under highest natural selection. In order to understand how mutation frequency plays out in intergenic regions and the factors contributing to this dynamic, whole-genome sequencing should be conducted.
Resource Availability
Lead Contact
Any further queries and requests should be addressed to corresponding author and lead contact Liisa Kauppi (liisa.kauppi@helsinki.fi) or to corresponding authors Elli-Mari Aska (elli.aska@helsinki.fi) and Denis Dermadi (ddermadi@stanford.edu).
Materials Availability
This study did not generate new unique reagents.
Data and Code Availability
Single-cell exome sequencing data generated and analyzed during the current study are deposited as raw reads in FASTQ format to SRA: PRJNA575619. The variants observed in single T cells supporting the conclusions of this article are provided with the article as Table S2 titled “Annotated variants in single-cell exomes” in xlsx file format. Publicly available H3K36me3 (ENCODE: ENCFF853BYO, ENCFF287DIJ) and RNApol2 (ENCODE: ENCFF119XEH) ChIPSeq data can be found from ENCODE (https://www.encodeproject.org) database.
Methods
All methods can be found in the accompanying Transparent Methods supplemental file.
Authors: W Edelmann; K Yang; M Kuraguchi; J Heyer; M Lia; B Kneitz; K Fan; A M Brown; M Lipkin; R Kucherlapati Journal: Cancer Res Date: 1999-03-15 Impact factor: 12.701
Authors: H Wu; X-Y Zhang; Z Hu; Q Hou; H Zhang; Y Li; S Li; J Yue; Z Jiang; S M Weissman; X Pan; B-G Ju; S Wu Journal: Oncogene Date: 2016-12-12 Impact factor: 9.867
Authors: Stefan M Woerner; Elena Tosti; Yan P Yuan; Matthias Kloor; Peer Bork; Winfried Edelmann; Johannes Gebert Journal: Mol Carcinog Date: 2014-09-11 Impact factor: 4.784
Authors: Lei Zhang; Xiao Dong; Moonsook Lee; Alexander Y Maslov; Tao Wang; Jan Vijg Journal: Proc Natl Acad Sci U S A Date: 2019-04-16 Impact factor: 11.205
Authors: Sven Diederichs; Lorenz Bartsch; Julia C Berkmann; Karin Fröse; Jana Heitmann; Caroline Hoppe; Deetje Iggena; Danny Jazmati; Philipp Karschnia; Miriam Linsenmeier; Thomas Maulhardt; Lino Möhrmann; Johannes Morstein; Stella V Paffenholz; Paula Röpenack; Timo Rückert; Ludger Sandig; Maximilian Schell; Anna Steinmann; Gjendine Voss; Jacqueline Wasmuth; Maria E Weinberger; Ramona Wullenkord Journal: EMBO Mol Med Date: 2016-05-02 Impact factor: 12.137