| Literature DB >> 26001532 |
Nuri A Temiz1, Duncan E Donohue, Albino Bacolla, Karen M Vasquez, David N Cooper, Uma Mudunuri, Joseph Ivanic, Regina Z Cer, Ming Yi, Robert M Stephens, Jack R Collins, Brian T Luke.
Abstract
DNA damage in somatic cells originates from both environmental and endogenous sources, giving rise to mutations through multiple mechanisms. When these mutations affect the function of critical genes, cancer may ensue. Although identifying genomic subsets of mutated genes may inform therapeutic options, a systematic survey of tumor mutational spectra is required to improve our understanding of the underlying mechanisms of mutagenesis involved in cancer etiology. Recent studies have presented genome-wide sets of somatic mutations as a 96-element vector, a procedure that only captures the immediate neighbors of the mutated nucleotide. Herein, we present a 32 × 12 mutation matrix that captures the nucleotide pattern two nucleotides upstream and downstream of the mutation. A somatic autosomal mutation matrix (SAMM) was constructed from tumor-specific mutations derived from each of 909 individual cancer genomes harboring a total of 10,681,843 single-base substitutions. In addition, mechanistic template mutation matrices (MTMMs) representing oxidative DNA damage, ultraviolet-induced DNA damage, (5m)CpG deamination, and APOBEC-mediated cytosine mutation, are presented. MTMMs were mapped to the individual tumor SAMMs to determine the maximum contribution of each mutational mechanism to the overall mutation pattern. A Manhattan distance across all SAMM elements between any two tumor genomes was used to determine their relative distance. Employing this metric, 89.5% of all tumor genomes were found to have a nearest neighbor from the same tissue of origin. When a distance-dependent 6-nearest neighbor classifier was used, 10.4% of the SAMMs had an Undetermined tissue of origin, and 92.2% of the remaining SAMMs were assigned to the correct tissue of origin. [corrected]. Thus, although tumors from different tissues may have similar mutation patterns, their SAMMs often display signatures that are characteristic of specific tissues.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26001532 PMCID: PMC4495249 DOI: 10.1007/s00439-015-1566-1
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
Details of the 21 whole-genome sequencing datasets examined in this study
| Dataset | Label | #Samples | Total #SBSs | Sourcea |
|---|---|---|---|---|
| Acute lymphoblastic leukemia | ALL | 1 | 7442 | Sanger |
| Acute myeloid leukemia (South Korea) | LAML-KR | 4 | 377,876 | ICGC |
| Breast triple negative/lobular cancer | BRCA-UK | 18 | 165,808 | ICGC |
| Breast cancer | Breast | 77 | 534,046 | Sanger |
| Esophageal adenocarcinoma | ESAD-UK | 16 | 290,325 | ICGC |
| Liver cancer NCC | LINC-JP | 31 | 329,052 | ICGC |
| Liver cancer RIKEN | LIRI-JP | 188 | 1,922,567 | ICGC |
| Liver cancer | Liver | 84 | 790,487 | Sanger |
| Lung adenocarcinoma | Lung_Adeno | 23 | 1,386,149 | Sanger |
| Malignant lymphoma DKFZ | MALY-DE | 37 | 267,052 | ICGC |
| Melanoma | Melanoma | 25 | 1,841,735 | (Berger et al. |
| Ovarian cancer QCMG | OV-AU | 89 | 833,427 | ICGC |
| Pancreatic cancer OICR | PACA-CA | 45 | 333,342 | ICGC |
| Pancreatic cancer QCMG | PACA-AU | 137 | 954,081 | ICGC |
| Pancreatic cancer endocrine neoplasms QCMG | PAEN-AU | 12 | 430,28 | ICGC |
| Pancreatic cancer | Pancreas | 14 | 103,032 | Sanger |
| Medulloblastoma | Medulloblastoma | 11 | 35,125 | Sanger |
| Pediatric brain tumors BMBF | PBCA-DE | 16 | 46,147 | ICGC |
| Prostate adenocarcinoma | PRAD-UK | 3 | 18,763 | ICGC |
| Prostate cancer | Prostate | 7 | 21,603 | Sanger |
| Renal clear cell carcinoma | RECA-EU | 71 | 379,756 | ICGC |
| Total | 909 | 10,681,843 |
Sanger: ftp://sanger.ac.uk/pub/cancer/AlexandrovEtAl/somatic_mutation_data/
aICGC: ftp://data.dcc.icgc.org/current/
SAMM for the acute lymphoblastic leukemia sample PD4020a
| Motif | 1.a | 2.a | 3.a | 1.t | 2.t | 3.t | 1.c | 2.c | 3.c | 1.g | 2.g | 3.g |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AAA | 0 | 0 | 0 | 0.000461 | 0.000546 | 0.000649 | 0.000188 | 0.000154 | 0.000256 | 0.000154 | 0.000154 | 0.000324 |
| AAT | 0 | 0 | 0.000264 | 0.000448 | 0.000264 | 0 | 0.000105 | 0.000158 | 0.000501 | 0.000158 | 0.000501 | 0.00029 |
| AAC | 0 | 0 | 0.00108 | 0.000405 | 0.000405 | 0.002611 | 0.000135 | 0.000135 | 0 | 0.000135 | 0.000585 | 0.000675 |
| AAG | 0 | 0 | 0.007979 | 0.000099 | 0.000493 | 0.001543 | 0.000033 | 0.000296 | 0.018453 | 0.00023 | 0.000328 | 0 |
| AGA | 0 | 0.023608 | 0 | 0.000267 | 0.00234 | 0.000326 | 0.000089 | 0.044194 | 0.000178 | 0.000385 | 0 | 0.000326 |
| AGT | 0 | 0.000488 | 0.000204 | 0.000326 | 0.000936 | 0 | 0.000448 | 0.001303 | 0.000204 | 0.000488 | 0 | 0.000163 |
| AGC | 0 | 0.000979 | 0.000606 | 0.000233 | 0.000699 | 0.001305 | 0.000093 | 0.000699 | 0 | 0.000093 | 0 | 0.000373 |
| AGG | 0 | 0.001615 | 0.004405 | 0.000294 | 0.000661 | 0.001468 | 0.00022 | 0.000771 | 0.004552 | 0.00022 | 0 | 0 |
| TAA | 0.000504 | 0 | 0 | 0 | 0.00041 | 0.000473 | 0.000252 | 0.000126 | 0.000189 | 0.000189 | 0.000032 | 0.000221 |
| TAT | 0.000319 | 0 | 0.000351 | 0 | 0.000256 | 0 | 0.000192 | 0.000288 | 0.000319 | 0.000256 | 0.000319 | 0.000192 |
| TAC | 0.000232 | 0 | 0.00081 | 0 | 0.000405 | 0.001389 | 0.000289 | 0.000058 | 0 | 0.000116 | 0.000347 | 0.000984 |
| TAG | 0.000255 | 0 | 0.007126 | 0 | 0.000356 | 0.001476 | 0.000204 | 0.000153 | 0.018986 | 0.000051 | 0.000305 | 0 |
| TGA | 0.000167 | 0.054179 | 0 | 0 | 0.004883 | 0.000201 | 0.000368 | 0.046353 | 0.000067 | 0.000201 | 0 | 0.000468 |
| TGT | 0.000293 | 0.001073 | 0.000358 | 0 | 0.000976 | 0 | 0.00013 | 0.000943 | 0.000586 | 0.000293 | 0 | 0.000163 |
| TGC | 0 | 0.001179 | 0.000816 | 0 | 0.001179 | 0.001587 | 0.000181 | 0.000408 | 0 | 0.000091 | 0 | 0.000499 |
| TGG | 0.000106 | 0.002586 | 0.003189 | 0 | 0.000957 | 0.001169 | 0.000106 | 0.001453 | 0.005137 | 0.000142 | 0 | 0 |
| CAA | 0.001803 | 0 | 0 | 0.012343 | 0.000139 | 0.000312 | 0 | 0.000139 | 0.000104 | 0.024062 | 0.000139 | 0.000277 |
| CAT | 0.002253 | 0 | 0.000179 | 0.017593 | 0.00025 | 0 | 0 | 0.000358 | 0.00025 | 0.009476 | 0.00025 | 0.000179 |
| CAC | 0.001607 | 0 | 0.000956 | 0.012946 | 0.000043 | 0.001868 | 0 | 0.000087 | 0 | 0.005951 | 0.000217 | 0.001129 |
| CAG | 0.002477 | 0 | 0.010743 | 0.019492 | 0.000161 | 0.000965 | 0 | 0.000161 | 0.012223 | 0.011869 | 0.000193 | 0 |
| CGA | 0.000295 | 0.035959 | 0 | 0.020927 | 0.002063 | 0.000295 | 0 | 0.007958 | 0.000295 | 0.003832 | 0 | 0.000589 |
| CGT | 0.000776 | 0.024316 | 0 | 0.016814 | 0.000259 | 0 | 0 | 0 | 0 | 0.001811 | 0 | 0 |
| CGC | 0.001084 | 0.011925 | 0.000271 | 0.019784 | 0 | 0.001897 | 0 | 0.000271 | 0 | 0.001626 | 0 | 0.000813 |
| CGG | 0.000468 | 0.009822 | 0.003976 | 0.021749 | 0.000468 | 0.000234 | 0 | 0.000468 | 0.002339 | 0.000935 | 0 | 0 |
| GAA | 0.037661 | 0 | 0 | 0.004192 | 0.000233 | 0.000133 | 0.032704 | 0.000067 | 0.0001 | 0 | 0.000366 | 0.000566 |
| GAT | 0.024472 | 0 | 0.000246 | 0.002948 | 0.000295 | 0 | 0.032236 | 0.000098 | 0.000393 | 0 | 0.000393 | 0.000246 |
| GAC | 0.015498 | 0 | 0.000554 | 0.00173 | 0.000208 | 0.001661 | 0.018681 | 0.000277 | 0 | 0 | 0.000415 | 0.000761 |
| GAG | 0.037753 | 0 | 0.006079 | 0.003291 | 0.000116 | 0.001123 | 0.05026 | 0.000194 | 0.009487 | 0 | 0.000426 | 0 |
| GGA | 0.002454 | 0.012271 | 0 | 0.00055 | 0.002708 | 0.000042 | 0.000677 | 0.012779 | 0.000169 | 0 | 0 | 0.000381 |
| GGT | 0.001518 | 0.000562 | 0.000225 | 0.000956 | 0.000787 | 0 | 0.000899 | 0.00045 | 0.000394 | 0 | 0 | 0.000056 |
| GGC | 0.001037 | 0.000819 | 0.000873 | 0.000437 | 0.000382 | 0.001965 | 0.000437 | 0.000164 | 0 | 0 | 0 | 0.000327 |
| GGG | 0.003413 | 0.000693 | 0.005046 | 0.000742 | 0.000297 | 0.000841 | 0.001435 | 0.000247 | 0.001929 | 0 | 0 | 0 |
Fractional mutation frequencies (7442 SBSs)
Computed and experimental ionization potentials of guanine and adenine using different basis sets
| Basis set | Guanine | Adenine |
|---|---|---|
| 6-31G(d) | 8.02 | 8.32 |
| cc-pVDZ | 8.01 | 8.32 |
| cc-pVTZ | 8.19 | 8.45 |
| cc-pVQZ | 8.22 | 8.48 |
| Expt. | 8.24 | 8.44 |
Computed vertical ionization potential (VIP) and vertical singlet excitation energy (VSEE) of the most likely pyrimidine π–π* transition (among the lowest three) for each of the DNA fragments
| Guanine-centered | Adenine-centered | ||||
|---|---|---|---|---|---|
| Sequence (5′-NGN-3′) | VIP (eV) | VSEE (eV)a | Sequence (5′-NAN-3′) | VIP (eV) | VSEE (eV)a |
| GGG | 5.39 | 6.40 (6) | GAG | 5.74 | 6.34 (5) |
| GGA | 5.50 | 6.37 (5) | GAC | 5.88 | 6.34 (5) |
| GGT | 5.54 | 6.37 (5) | GAA | 5.89 | 6.29 (3) |
| AGG | 5.57 | 6.39 (5) | CAG | 5.91 | 6.35 (5) |
| TGG | 5.59 | 6.39 (5) | GAT | 5.91 | 6.34 (4) |
| CGG | 5.60 | 6.36 (5) | AAG | 5.92 | 6.22 (3) |
| GGC | 5.63 | 6.37 (6) | CAC | 5.92 | – |
| TGA | 5.64 | 6.26 (3) | TAG | 5.93 | 6.35 (4) |
| AGA | 5.66 | 6.32 (5) | CAT | 6.04 | – |
| CGA | 5.79 | 6.31 (4) | TAC | 6.05 | – |
| AGT | 5.81 | 6.29 (4) | CAA | 6.05 | 6.23 (2) |
| CGT | 5.86 | – | AAC | 6.11 | 6.32 (4) |
| CGC | 5.88 | – | AAA | 6.37 | 6.27 (3) |
| TGT | 5.90 | – | TAA | 6.51 | 6.24 (3) |
| AGC | 5.95 | 6.32 (5) | AAT | 6.52 | 6.23 (3) |
| TGC | 5.97 | – | TAT | 6.55 | – |
aThe number of the excited state (ground state = 0) is given in parentheses