| Literature DB >> 27150584 |
Bálint Mészáros1, András Zeke2, Attila Reményi2, István Simon3, Zsuzsanna Dosztányi4.
Abstract
BACKGROUND: Recent advances in sequencing technologies enable the large-scale identification of genes that are affected by various genetic alterations in cancer. However, understanding tumor development requires insights into how these changes cause altered protein function and impaired network regulation in general and/or in specific cancer types.Entities:
Keywords: Cancer; Deletion; Driver gene; Insertion; Missense mutation; Protein functional modules; Somatic mutation
Mesh:
Year: 2016 PMID: 27150584 PMCID: PMC4858844 DOI: 10.1186/s13062-016-0125-6
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Fig. 1Outline of the method. All local somatic mutations are collected from the COSMIC database for a given gene, discarding mutations coming from hypermutated samples (see Methods) and mutations overlapping with low complexity regions. Next, a seed region in the corresponding protein sequence is selected and is assessed for significant enrichment of mutations compared to the expected random distribution using a one-sided Fisher’s exact test. Next, if the selected region is significant (p-value <0.01) its boundaries are moved to either side to locally maximize significance. This is repeated for all possible seed regions of 7, 10 and 30 residues in length. After the evaluation of all seed regions, the resulting optimized regions are merged if overlap occurs between them. For an exhaustive description of the algorithm see Additional file 1
Summary of identified SiMPRes. Regions are grouped according to their significance level (see Methods) and their dominant mutation type
| Dominant mutation type | |||||
|---|---|---|---|---|---|
| Missense mutations | Insertions | Deletions | Total | ||
| Significance level | High significance | 68 | 5 | 10 | 83 (15.5 %) |
| Medium significance | 77 | 0 | 13 | 90 (16.9 %) | |
| Low significance | 285 | 9 | 67 | 361 (67.6 %) | |
| TOTAL | 430 (80.5 %) | 14 (2.6 %) | 90 (16.9 %) | 534 (100 %) | |
Fig. 2Comparison of the effectiveness and overlap between methods. Number of genes (a) identified by the different methods together with respective overlaps. The recovery rates of all three methods tested on the SCGD (b), oncogenes (c) and tumor suppressors (d)
Fig. 3The ratio of proteins with significantly mutated regions in various classes of typical genetic alterations. Based on the KEGG annotation, genes are grouped according to the dominant genetic aberrations for a given gene. Red bars show the fraction of proteins with found regions, the horizontal black line shows the average ratio of 0.752. Numbers above bars show the number of known genes in the category. (LOH = Loss of heterozygosity)
Twenty-seven main cancer types. Cancer types are shown with the corresponding tissue/organ of occurrence and number of local somatic mutations and originating samples in COSMIC
| Tissue/organ | Cancer types | Number of local somatic mutations in COSMIC | Number of samples | Average number of mutations per sample |
|---|---|---|---|---|
| Bladder | Bladder cancer | 6 265 | 3 027 | 2.07 |
| Blood | Acute myeloid leukemia | 11 650 | 5 857 | 1.99 |
| Chronic myeloid leukemia | 1 603 | 1 041 | 1.54 | |
| Lymphoma | 24 572 | 6 944 | 3.54 | |
| Bone | Bone cancer | 2 814 | 706 | 3.99 |
| Brain | Glioblastoma | 9 361 | 2 909 | 3.22 |
| Neuroblastoma | 4 932 | 619 | 7.97 | |
| Glioma | 12 793 | 2 947 | 4.34 | |
| Medulloblastoma | 4 701 | 728 | 6.46 | |
| Breast | Breast cancer | 31 544 | 4 404 | 7.16 |
| Cervix | Cervical cancer | 136 | 132 | 1.03 |
| Colorectal | Colorectal cancer | 37 727 | 25 806 | 1.46 |
| Esophagus | Esophageal cancer | 3 659 | 433 | 8.45 |
| Head and neck | Thyroid cancer | 19 975 | 13 908 | 1.44 |
| Head and neck carcinoma | 75 | 67 | 1.12 | |
| Kidney | Renal cell carcinoma | 27 897 | 1 750 | 15.94 |
| Liver | Hepatocellular carcinoma | 19 100 | 2 091 | 9.13 |
| Lung | Small cell lung cancer | 976 | 208 | 4.69 |
| Non-small cell lung cancer | 19 991 | 11 236 | 1.78 | |
| Ovary | Ovarian cancer | 19 286 | 2 936 | 6.57 |
| Pancreas | Pancreatic cancer | 33 776 | 5 609 | 6.02 |
| Prostate | Prostate cancer | 18 813 | 967 | 19.46 |
| Skin | Melanoma | 12 374 | 7 923 | 1.56 |
| Squamous cell carcinoma | 15 726 | 3 459 | 4.55 | |
| Basal cell carcinoma | 292 | 251 | 1.16 | |
| Stomach | Stomach cancer | 7 795 | 1 430 | 5.45 |
| Uterus | Endometrial cancer | 5 722 | 1 627 | 3.52 |
| Total | 353 555 | 109 015 | 3.24 | |
Fig. 4Overlap between cancer-related genes and significantly mutated regions. Blue bars show proteins that harbor at least one region annotated to the correct cancer-type. Shades of blue show the significance level of the most significant region found. Grey bars show the number of proteins that harbor at least one region but where the region is annotated to a different cancer type. Red bars show the number of proteins without significant regions. Numbers above the bars show the number of mutations in the COSMIC database annotated to the cancer type. Order of cancer types reflects the decreasing number of known genes from left to right
Fig. 5Proteins with multiple regions covering multiple cancer types. Symmetrically positioned boxes represent structural/functional protein units: grey – signal sequence, black – transmembrane region, various colors – domains (red = catalytic domains, blue = all other domains, with abbreviated names written in the box). Boxes above the line represent significantly mutated regions. Colors denote dominant mutation types: black – missense, red – deletions, blue – insertions. Regions are flagged with dominant cancer types together with the p-value of the region. GIST – Gastrointestinal stromal tumor, AML – Acute Myeloid Leukemia, CML – Chronic Myeloid Leukemia. All examples feature multiple regions involved in the same cancer types (a) or multiple cancer-specific regions (b)
The occurrence of SiMPRes in known structural/functional protein sites/regions. Color codes represent over- and under-representation compared to random. Shades of red show increasing over-representation. The amount of over- and under-representation is given in standard deviation units calculated from 1000 randomly assigned regions. ‘Regions of interest’ marks experimentally characterized protein regions that can be of interest concerning protein function (such as interaction sites, different regions of multifunctional enzymes or regions crucial for biological processes/sub-cellular localizations)
Medium significance region genes that are absent from all somatic cancer gene databases
| Gene | Region |
| Dominant cancer type(s) | Protein name | Protein annotations | Region annotations | Indication of involvement in cancer |
|---|---|---|---|---|---|---|---|
| WASH3P | 368–410 | 1.050*10−14 | Renal cell carcinoma | Putative WAS protein family homolog 3 | Pseudogene homolog of WASP, nucleation-promoting factor of endosomes | Missense mutations affect mainly one position. Region is part of Pfam-B conserved accross wide range of eukaryotes and probably disordered | Some indication of possible involvement in tumors (PMID: 21208217) |
| FRG1B/C20orf80 | 40–101 | 7.492*10−14 | Prostate cancer, Glioma | Protein FRG1B | Unknown | Well distributed missense mutations in structured FRG1 domain, no known function, but conserved across eukar + some bact. | Only based on mutation pattern, no cancer specific annotations |
| ANKRD36C/ENSG00000174501 | 626–634 | 2.620*10−12 | Prostate cancer, Glioma | Ankyrin repeat domain- containing protein 36C | Unknown | Well clustered missense mutation peaks in an unannotated, possibly disordered region of the protein | Very pleriminaty indication of possible role in various cancer types |
| ZNF814 | 337-337 | 6.451*10−11 | Pancreatic cancer, Squamous cell carcinoma | Putative uncharacterized zinc finger protein 814 | Acts as a trascription factor with specific DNA binding | Sharp peak of missense mutations N-terminal of the zinc binding domains | Very pleriminaty indication of possible role in some cancer types |
| RP1L1 | 1305–1361 | 1.654*10−9 | Various | Retinitis pigmentosa 1-like 1 protein | Involved in axoneme assembly, photoreceptor cell development and retina development in camera-type eye | Broad peak of missense mutations and indels in the central, possibly disordered region of the protein | Indication of involvement in gastric and colorectar cancers (PMID: 23237666) |
| RRN3P2/ENSG00000103472 | 368–375 | 1.676*10−9 | Prostate cancer | RRN3 homolog, RNA polymerase I transcription factor pseudogene 2 | Unknown | Sharp peak of missense mutations in the RRN3 domain | Unknown |
| MUC6 | 1873–1995 | 3.663*10−7 | Prostate cancer | Mucin 6 | Modulates the composition of the protective mucus layer. Important in the cytoprotection of pithelial surfaces, used as tumor markers in a variety of cancers. May play a role in epithelial organogenesis. | Broad peak of missense mutations in a possibly disordered region of the protein | Known to be linked various forms of cancer (PMID: 21851820, PMID: 9650551) |
| EEF1B2 | 43-43 | 3.739*10−7 | Prostate cancer | Elongation factor 1-beta | Translation elongation factor, guanine nucleotide exchange factor involved in the transfer of aminoacylated tRNAs to the ribosome | Sharp peak of missense mutations in the N-terminal region of the protein | Unknown |
| POTEC | 477–511 | 4.504*10−7 | Prostate cancer | POTE ankyrin domain family member C | Unknown | Multiple peaks of missense mutations in the C-terminal disordered part of the protein, encompassing a possible DNA binding motif | Unknown |
| EIF1AX | 2–15 | 1.457*10−6 | Thyroid cancer, Melanoma | Eukaryotic translation initiation factor 1A, X-chromosomal | Required for maximal rate of protein biosynthesis, enhances ribosome dissociation | N-terminal disordered region, harboring many missense mutations | Indication of involvement in melanoma (PMID: 24423917) |
| CS | 183–187 | 1.670*10−6 | Bile duct/gallbladder cancer | Mitochondrial citrate synthase | Involved in step 1 of the subpathway that synthesizes isocitrate from oxaloacetate | Well localized peaks of missense mutations in the citrate synthase domain | Indication of involvement in some cancers (PMID: 19647716) |
| RGPD8 | 1760-1760 | 2.200*10−6 | Prostate cancer, Glioma | RANBP2-like and GRIP domain-containing protein 8 | Unknown | Single peak of missense mutations at the C-terminal, possibly disordered region | Very pleriminaty indication of possible marker role in some cancer types |
| KRTAP4-9 | 57-57 | 3.407*10−6 | Breast cancer | Keratin-associated protein 4–9 | Part of an interfilamentous matrix, in which hair keratin intermediate filaments are embedded | Peak of missense mutations | Located in a potential breakpoint initiating ERBB2 amplification, which is known to be involved in breast cancer (PMID: 23181561) |
| KRTAP4-8 | 95-95 | 5.261*10−6 | Glioma | Keratin-associated protein 4–8 | Peak of missense mutations | ||
| KRTAP9-9 | 18–30 | 9.921*10−6 | Pancreatic cancer, Breast cancer | Keratin-associated protein 9-9 | Short region dominated by indels |
Fig. 6Connection between various genetic alterations and significantly mutated regions. Rows correspond to various types of genetic alterations. Columns from left to right show normal protein function, protein function modulated by the given genetic alteration and protein function modulated by the occurrence of significantly mutated regions