| Literature DB >> 22492711 |
Abstract
Cancer-associated mutations in cancer genes constitute a diverse set of mutations associated with the disease. To gain insight into features of the set, substitution, deletion and insertion mutations were analysed at the nucleotide level, from the COSMIC database. The most frequent substitutions were c → t, g → a, g → t, and the most frequent codon changes were to termination codons. Deletions more than insertions, FS (frameshift) indels more than I-F (in-frame) ones, and single-nucleotide indels, were frequent. FS indels cause loss of significant fractions of proteins. The 5'-cut in FS deletions, and 5'-ligation in FS insertions, often occur between pairs of identical bases. Interestingly, the cut-site and 3'-ligation in insertions, and 3'-cut and join-pair in deletions, were each found to be the same significantly often (p < 0.001). It is suggested that these features aid the incorporation of indel mutations. Tumor suppressors undergo larger numbers of mutations, especially disruptive ones, over the entire protein length, to inactivate two alleles. Proto-oncogenes undergo fewer, less-disruptive mutations, in selected protein regions, to activate a single allele. Finally, catalogues, in ranked order, of genes mutated in each cancer, and cancers in which each gene is mutated, were created. The study highlights the nucleotide level preferences and disruptive nature of cancer mutations.Entities:
Mesh:
Year: 2012 PMID: 22492711 PMCID: PMC3413105 DOI: 10.1093/nar/gks290
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Histograms showing the frequency of occurrence of each of the 12 possible base changes at pos1, pos2 and pos3 of codons in: (a) synonymous (b) missense and (c) nonsense substitutions. In each histogram, base changes are indicated along the x-axis, the number of times that each base change is observed (frequency) is indicated along the y-axis and the frequencies of base changes at pos1, pos2 and pos3 of codons are shown as separate series.
Summary of results for substitution mutations
| Numbers of 1-, 2-, 3-, 4-, 5-base substitutions: 6013, 169, 12, 2, 2 | ||||
| Types | Observed | At pos1 | At pos2 | At pos3 |
| Synonymous | 489 | 20 | 0 | 469 |
| Missense | 4555 | 1951 | 2262 | 342 |
| Nonsense | 964 | 672 | 132 | 160 |
| No-stop | 5 | 0 | 3 | 2 |
| Total | 6013 | 2643 | 2397 | 973 |
| Most | c→t (1267) | c→t (713) | g→a (457) | g→a (189) |
| frequent | g→a (1239) | g→a (593) | c→t (375) | c→t (179) |
| base | g→t (697) | g→t (399) | ||
| changes | g→t (204) | |||
| Numbers of WT bases undergoing substitution: | ||||
| Numbers of mutant bases after substitution : g(1041), c(897), | ||||
| Amino acids undergoing the most substitutions: | ||||
| Synonymous G(65), L(60) | ||||
| Nonsense R(208), Q(206), E(191) | ||||
| Missense G(654) undergoes most mutations; C(294), K(262), N(211) generated in | ||||
| significant numbers; interesting mutations: P(239)→S(91),L(86); | ||||
| A(296)→T(108),V(94); Y(136)→C(69); E(286)→K(153) | ||||
| Most frequently occurring single-base substitutions: | ||||
| cga_R→tga_TER, 194; cag_Q→tag_TER, 158; gag_E→tag_TER, 102; gaa_E→taa_TER, 89; gag_E→aag_K, 87; ggt_G→gat_D, 75; gaa_E→aaa_K, 66; ggc_G→gac_D, 63; ggc_G→agc_S, 60; cgg_R→tgg_W, 59; tct_S→ttt_F, 58; tgg_W→tga_TER, 57; gtg_V→atg_M, 52 | ||||
| WT bases substituted most frequently : gg (39), cc (31), tg (22), gc (16) | ||||
| Mutant bases observed most frequently: tt (51), aa (29), at (21), ct (13) | ||||
| Most frequently observed substitutions : cc→tt (28), gg→aa (13), gg→tt (10) | ||||
| Amino acid mutations resulting from 2- and 3-base substitutions: | ||||
| To approximately how many mutant codons does a WT codon mutate, in cancer? | ||||
| 61 WT → six or more mutant | ||||
| 44 WT → 8–11 mutant | ||||
| 13 WT → 6–7 mutant | ||||
| 4 WT → 15–19 mutant (ctg_L → 15; ggc_G → 15; gtg_V → 16; ggt_G → 19) | ||||
| 1 WT TER codon → three non-TER mutant (no-stop mutations) | ||||
Figure 2.Histogram showing the frequency of occurrence of each of the 12 possible base changes when all substitution mutations (synonymous, missense, nonsense), occurring at pos1, pos2 and pos3 of codons, are considered. Base changes are indicated along the x-axis and their frequencies are indicated along the y-axis.
Figure 3.Histogram showing the frequency of occurrence of each type of FS and I-F deletion and insertion (for nomenclature, see Supplementary Figure S3). The first series shows the frequency distribution for deletions, the second for insertions.
Figure 4.Length distributions of the different types of FS and I-F: (a) deletions and (b) insertions. The lengths of indels (in nt) and their frequency of occurrence are given along the x- and y-axes, respectively. The three series, for each length, give the frequencies of the three types of deletions or insertions specified along the x-axis; for example, deletions of length 1 nt result due to FS deletions of types 1-1, 2-2, 3-3, whose frequencies, respectively, are 305, 332, 337. The first three bars give the frequencies of 1-1, 2-2, 3-3 type FS indels (1 nt), the next three give the frequencies of 1-2, 2-3, 3-1 type FS indels (2 nt) and the next three give the frequencies of 1-3, 2-1, 3-2 type I-F indels (3 nt). The cycle then repeats, with the next three bars again giving the frequencies of 1-1, 2-2, 3-3 type FS indels (4 nt), and so on.
Cut- and join-sites in indels
| Deletions | Insertions |
|---|---|
| 2021 588 | 903 347 |
| gtgga | gagccct |
| cc | gcttctt |
| Start-cut: | Cut-site : |
| End-cut : g| | 5′-Ligation: |
| Join-pair: | 3′-Ligation: a| |
| FS: | FS: |
| Start-cut : a-a, c-c, g-g, t-t | 5′-Ligation: a-a, t-t, c-c |
| End-cut : a-g | 3′-Ligation: t-g, a-c, a-g |
| I-F: | I-F: |
| No start-, end-cut preferences | No 5′-, 3′-ligation preferences |
| FS, I-F: | FS, I-F: |
| End-cut, join-pair often same | Cut-site, 3′-ligation often same |
Figure 5.Histograms showing the frequency with which each of 16 pairs of adjacent nt are cut at the start and end of FS deletions (a and b), and occur as 5′- and 3′-ligations in FS insertions (c and d). In (a), the two series show the frequencies with which each nt pair (e.g. a-a) is cut at the start and end of FS deletions [212, 93]; the difference between the two frequencies for each nt pair [119] is given in (b). In (c), the two series show the frequencies with which each nt pair (e.g. a-a) forms 5′- and 3′-ligations in FS insertions [137, 69]; the difference between the two frequencies for each nt pair [68] is given in (d).
Figure 6.Joint frequencies of cut- and join-sites in deletions and insertions. There are four groups of bars; the first two are for FS and I-F deletions, the last two for FS and I-F insertions. The first bar in each group gives the total number of mutations (FS or I-F deletions or insertions) that have cut- and join-sites. In the first two groups of bars (FS and I-F deletions), the second, third, fourth and fifth bars, respectively, give the number of times that: (i) start-cut, end-cut, join-pair are same, (ii) start-cut, end-cut, join-pair are different, (iii) only start-cut, join-pair are same and (iv) only end-cut, join-pair are same. In the last two groups of bars (FS and I-F insertions), the second, third, fourth and fifth bars, respectively, give the number of times that: (i) cut-site, 5′-ligation, 3′-ligation are same, (ii) cut-site, 5′-ligation, 3′-ligation are different, (iii) only cut-site, 5′-ligation are same and (iv) only cut-site, 3′-ligation are same.
Figure 7.(a) Histogram showing the fractions of protein lost as a result of FS [2021] and I-F [588] deletions (first and second series). Fractions are given as intervals along the x-axis, and the number of deletions occurring in each interval is given along the y-axis. The fraction of protein lost due to each deletion was calculated as: (number of codons lost)/(number of codons in WT protein). The fraction was <0.1 for 87% (510/588) of I-F deletions, and ≥0.1 for 91% [(2021−178 = 1843)/2021), ≥0.2 for 84% (1705/2021) and ≥0.4 for 60% of FS deletions. (b) Histogram showing the fractions of protein gained or lost as a result of FS [903] and I-F [347] insertions (first and second series). Fractions are given as intervals along the x-axis (range, 0.3 through −1.0). Fractions >0 indicate increase, and <0 indicate decrease in protein length. The number of observations in each interval is given along the y-axis. The fraction of protein gained or lost due to each insertion was calculated as: (number of codons in mutant protein–number of codons in WT protein)/(number of codons in WT protein). Nearly 96% (333/347) of I-F insertions caused increase, and 91% [(903−81 = 822)/903] of FS insertions caused decrease in protein length.
Distribution of substitution, deletion and insertion mutations in 29 TS and 24 PO. The second column gives the basis of the classification of each gene; ts and po refer to the classification of the gene by Swiss-prot (40), F refers to the classification given in Table 4 in ref. (24), V refers to that given in Table 1 in ref. (6) and N refers to that obtained via internet searches. The total numbers of mutations [6517, 2900] and missense mutations [2156, 2138], observed in the sets of TS and PO, are given
| Gene names | Classifications | Synonymous | Missense | Nonsense | I-F deletions | FS deletions | I-F insertions | FS insertions |
|---|---|---|---|---|---|---|---|---|
| TS: | ||||||||
| GATA1 | tsF | 2 | 9 | 8 | 4 | 31 | 1 | 37 |
| ATM | ts | 2 | 106 | 16 | 6 | 21 | 1 | 9 |
| BRCA2 | tsN | 2 | 13 | 3 | 12 | 5 | ||
| MLH1 | ts | 2 | 16 | 3 | 1 | 9 | 1 | |
| MSH2 | ts | 5 | 10 | 10 | 1 | 9 | 1 | |
| MSH6 | tsF | 8 | 20 | 7 | 9 | 6 | ||
| TP53 | ts | 4 | 324 | 58 | 11 | 39 | 4 | 12 |
| SMARCA4 | tsN | 14 | 2 | 8 | ||||
| SMARCB1 | ts | 2 | 11 | 28 | 2 | 29 | 1 | 19 |
| NOTCH1 | ts/poN | 7 | 62 | 23 | 12 | 14 | 49 | 40 |
| RUNX1 | ts/poN | 5 | 35 | 5 | 22 | 7 | 23 | |
| CDH1 | tsF | 6 | 52 | 12 | 11 | 35 | 14 | |
| HNF1A | tsN | 33 | 7 | 3 | 16 | 1 | 6 | |
| NF1 | ts | 2 | 22 | 31 | 4 | 48 | 5 | |
| NF2 | ts | 5 | 25 | 73 | 26 | 241 | 26 | |
| VHL | ts | 37 | 232 | 37 | 49 | 306 | 5 | 78 |
| FBXW7 | tsF/tsN | 68 | 18 | 2 | 6 | 1 | 10 | |
| SMAD4 | tsF | 4 | 88 | 25 | 2 | 16 | 11 | |
| SOCS1 | tsN | 1 | 3 | 6 | 15 | 1 | ||
| APC | ts | 11 | 109 | 166 | 2 | 400 | 2 | 151 |
| CDC73 | ts | 2 | 7 | 1 | 15 | 3 | ||
| CDKN2A | ts | 68 | 310 | 75 | 43 | 103 | 4 | 35 |
| CEBPA | tsF/poN | 9 | 19 | 13 | 28 | 104 | 76 | 92 |
| MEN1 | tsF | 31 | 17 | 10 | 57 | 1 | 14 | |
| PTCH1 | ts | 10 | 74 | 45 | 5 | 37 | 2 | 18 |
| PTEN | ts | 20 | 366 | 107 | 42 | 257 | 5 | 110 |
| RB1 | po/tsF/tsV | 4 | 23 | 68 | 5 | 46 | 1 | 14 |
| STK11 | tsF | 9 | 46 | 19 | 10 | 27 | 10 | |
| WT1 | ts/poF/tsV | 33 | 16 | 2 | 28 | 1 | 76 | |
| 2156 | 6517 | |||||||
| PO: | ||||||||
| PTPN11 | poF | 2 | 45 | |||||
| JAK2 | po | 8 | 35 | 6 | 5 | |||
| NPM1 | po/tsF | 2 | 52 | |||||
| BRAF | po | 16 | 152 | 2 | 6 | 1 | 3 | |
| MPL | po | 10 | 1 | |||||
| ABL1 | po | 1 | 24 | |||||
| ALK | po | 3 | 22 | 1 | ||||
| CSF1R | po | 6 | 14 | 3 | ||||
| CTNNB1 | poF | 21 | 402 | 5 | 91 | 4 | 2 | |
| EGFR | ts/poF/poN | 19 | 194 | 4 | 37 | 2 | 35 | 2 |
| ERBB2 | po | 3 | 32 | 10 | ||||
| FGFR3 | poF | 14 | 45 | 5 | 3 | 1 | ||
| FLT3 | poF | 4 | 30 | 6 | 1 | 82 | 1 | |
| GNAS | po | 21 | ||||||
| HRAS | po | 3 | 101 | 1 | 1 | 1 | ||
| KIT | po | 30 | 151 | 3 | 106 | 4 | 30 | |
| KRAS | po | 13 | 272 | 1 | 1 | 2 | 8 | |
| MET | po | 3 | 25 | 1 | 5 | |||
| NRAS | po | 3 | 121 | 1 | ||||
| PDGFRA | poF | 6 | 25 | 14 | 1 | 2 | 1 | |
| PIK3CA | ts/poF/poN | 14 | 332 | 3 | 11 | 3 | ||
| RET | po | 6 | 33 | 11 | ||||
| SMO | poF | 2 | 12 | 1 | ||||
| TSHR | poN | 38 | 2 | |||||
| 2138 | 2900 |
Figure 8.Distribution of mutation positions over the lengths of proteins. Genes [40] are listed along the x-axis and each gene name is prefixed by po, ts or b, which indicate, respectively, whether the gene functions as a PO, a TS or as both. For each gene, there is a pair of bars which are related to each other. The %fraction of the protein given in the first bar contains the %fraction of mutation positions given in the second bar [Supplementary Methods (ii)]. For example, in the PO, CTNNB1, 89% of all mutation positions (second bar) occur in 13% of the protein length (first bar). A tall second bar and a short first bar indicate that the majority of mutations occur in a small segment of the protein; first and second bars of nearly equal length indicate that the mutations occur over the entire length of the protein.