| Literature DB >> 34093666 |
Igor B Rogozin1, Abiel Roche-Lima2, Kathrin Tyryshkin3, Kelvin Carrasquillo-Carrión4, Artem G Lada5, Lennard Y Poliakov6, Elena Schwartz7, Andreu Saura6, Vyacheslav Yurchenko6,8, David N Cooper9, Anna R Panchenko3, Youri I Pavlov10,11,12.
Abstract
Cancer genomes harbor numerous genomic alterations and many cancers accumulate thousands of nucleotide sequence variations. A prominent fraction of these mutations arises as a consequence of the off-target activity of DNA/RNA editing cytosine deaminases followed by the replication/repair of edited sites by DNA polymerases (pol), as deduced from the analysis of the DNA sequence context of mutations in different tumor tissues. We have used the weight matrix (sequence profile) approach to analyze mutagenesis due to Activation Induced Deaminase (AID) and two error-prone DNA polymerases. Control experiments using shuffled weight matrices and somatic mutations in immunoglobulin genes confirmed the power of this method. Analysis of somatic mutations in various cancers suggested that AID and DNA polymerases η and θ contribute to mutagenesis in contexts that almost universally correlate with the context of mutations in A:T and G:C sites during the affinity maturation of immunoglobulin genes. Previously, we demonstrated that AID contributes to mutagenesis in (de)methylated genomic DNA in various cancers. Our current analysis of methylation data from malignant lymphomas suggests that driver genes are subject to different (de)methylation processes than non-driver genes and, in addition to AID, the activity of pols η and θ contributes to the establishment of methylation-dependent mutation profiles. This may reflect the functional importance of interplay between mutagenesis in cancer and (de)methylation processes in different groups of genes. The resulting changes in CpG methylation levels and chromatin modifications are likely to cause changes in the expression levels of driver genes that may affect cancer initiation and/or progression.Entities:
Keywords: computational biology; database; frequency matrices; gene expression; immunoglobulin genes; somatic hypermutation; tumor cells
Year: 2021 PMID: 34093666 PMCID: PMC8170131 DOI: 10.3389/fgene.2021.671866
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Nucleotide frequency matrices for mutations at A:T sites [(A) DNA pol η; (B) pol θ] and G:C sites [(C) pol θ; (D) DNA pol η]. Known mutable motifs (consensus sequences) (Matsuda et al., 2001; Rogozin et al., 2001) are shown below each matrix in bold, whereas mutable positions are underlined. Putative (previously unobserved) parts of mutable motifs and potentially informative positions are italicized, W = A or T; Y = T or C; B = A, T or G; D = A, T, or G. Source of data: Supplementary Figures 2, 3.
FIGURE 2Schematic representation of the procedure used for construction of Table 3. Each circle represents a methylated CpG site, with its size reflecting the methylation level. Red “X” denotes CpG sites that overlap with known mutable motifs. The left and right panels correspond to thresholds 25% and 75%. The left panel: The set “1” (the methylation levels are smaller than 25%) is compared to set “2” (the methylation levels are larger than 25%). The right panel: The set “1” (the methylation levels are larger than 75%) is compared to set “2” (the methylation levels are smaller than 75%).
Analysis of methylation in CpG sites that overlap with pols η and θ mutable motifs.
| Group of genes | Number of CpG sites | Tests | AID | Pol η | Pol θ |
| Driver | Ratio | 1.025 | 0.997 | 0.994 | |
| 2,867 | NSE | NSE | |||
| 149,480 | MC test | <0.001 | 0.772 | 0.95 | |
| Non-driver | Ratio | 1.027 | 0.993 | 0.985 | |
| 5,558 | NSE | NSE | |||
| 239,220 | MC test | <0.001 | 0.989 | 0.989 | |
| Driver | Ratio | 1.004 | 1.009 | 1.021 | |
| 96,917 | NSE | ||||
| 51,290 | MC test | 0.433 | <0.001 | <0.001 | |
| Non-driver | Ratio | 1.007 | 1.009 | 1.023 | |
| 155,205 | |||||
| 89,573 | MC test | <0.001 | <0.001 | <0.001 | |
FIGURE 3Correlation of pol η (eta) and θ (theta) mutable motifs and the sequence context of somatic mutations. For the actual data, see Supplementary Tables 3, 4. The intensities of the gray color correspond to the t-test values (Supplementary Tables 3, 4). The unweighted pair group method, with arithmetic mean (UPGMA) clustering of ratio values for the pol η and θ footprints and tissues, is shown to the left and top. The upper left panel shows the distribution of the studied t-test values and correspondence of the t-test values and color intensity (the darker colors correspond to the higher correlation values). A similar plot of ratio values (the ratio being the mean weight of mutated sites divided by the mean weight of non-mutated sites) is shown in the Supplementary Figure 5.
Correlation between the sequence context of somatic mutations and mutable motifs in fragments of human immunoglobulin genes.
| Locus | Test | Number of Mutations | AID/G:C | Pol η/G:C | Pol θ/G:C | Number of Mutations | Pol η/A:T | Pol θ/A:T |
| V | Ratio | 583 | 1.208 | 1.027 | 1.091 | 351 | 1.082 | 0.979 |
| NSE | NSE | |||||||
| MC test | <0.001 | 0.004 | <0.001 | <0.001 | 0.699 | |||
| J | Ratio | 177 | 1.341 | 1.05 | 1.029 | 95 | 1.041 | 1.032 |
| NSE | ||||||||
| MC test | <0.001 | 0.002 | 0.106 | 0.004 | 0.011 | |||
| J | Ratio | 227 | 1.278 | 1.009 | 1.011 | 25 | 0.957 | 0.98 |
| NSE | NSE | NSE | NSE | |||||
| MC test | <0.001 | 0.329 | 0.061 | 0.776 | 0.67 |
Correlation between mutable motifs and the sequence context of somatic mutations in driver and non-driver genes.
| Group of genes | Test | Number of G:C mutations | AID/G:C | Pol η/G:C | Pol θ/G:C | Number of A:T mutations | Pol η/A:T | Pol θ/A:T |
| All genes | Ratio | 137,775 | 1.021 | 1.005 | 1.091 | 145,768 | 0.992 | 1.011 |
| NSE | ||||||||
| MC test | <0.001 | <0.001 | <0.001 | 1 | <0.001 | |||
| Drivers | Ratio | 4,246 | 1.107 | 1.001 | 1.007 | 3,918 | 0.98 | 1.032 |
| NSE | NSE | NSE | ||||||
| MC test | <0.001 | 0.346 | 0.037 | 1 | <0.001 | |||
| Non-drivers | Ratio | 3,553 | 1.079 | 1.059 | 1.057 | 2,793 | 0.995 | 1.045 |
| NSE | ||||||||
| MC test | <0.001 | <0.001 | <0.001 | 0.874 | <0.001 |
FIGURE 4Putative mechanism of an interplay between AID and TLS polymerases.
Levels of methylation in positions of somatic mutation in CpG sites, the threshold value = 75%.
| Group of genes | Number of mutations in CpGs sites | Tests | AID | Pol η | Pol θ |
| Driver | Ratio | 1.111 | 1.136 | 1.046 | |
| 249 | NSE | ||||
| 52 | MC test | 0.004 | <0.001 | 0.009 | |
| Non-driver | Ratio | 1.015 | 1.125 | 1.061 | |
| 390 | NSE | ||||
| 264 | MC test | 0.222 | <0.001 | <0.001 |
FIGURE 5Violin plot of mRNA expression (FPKM values) for sets of driver and non-driver genes. Log2 transformation was used.