| Literature DB >> 30670742 |
Paul Ashford1, Camilla S M Pang1, Aurelio A Moya-García1,2, Tolulope Adeyelu1, Christine A Orengo3.
Abstract
Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) - structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30670742 PMCID: PMC6343001 DOI: 10.1038/s41598-018-36401-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1CATH-FunFams form the basis of our mutation analysis pipeline and are central to the workflow used in this analysis of cancer mutations (A). CATH-Superfamilies, or equivalent Pfam families, may include sequences with diverse functions (B, top). FunFams subdivide families into more functionally specific sets of sequences (B, bottom) allowing detection of mutation enrichment specific to FunFam1 that is not significant when mutations in FunFam1 and FunFam2 are aggregated at the family level.
Figure 2Heatmap shows overall diversity of functions affected in different cancers and clustering of certain cancers by primary tumour site. Each horizontal bar represents a MutFam, with enrichment factors shown using colour intensity. MutFams that include tumour suppressor genes TP53 and PTEN are shared amongst the largest number of cancers. Tumours from the same primary site that cluster by their MutFams are shown in more detail for gliomas (top right) and colorectal cancers (bottom right) and include the CATH Functional Family names. For gliomas, MutFams include genes TP53, PTEN and CHEK2 found in all three subtypes (LGG, GBM and GLI), with genes such as EGFR and ZNF429 found only in GLI and late stage GBM. For colorectal cancers, common MutFams include TP53, PTEN, KRAS, PIK3CA and the F-box/WE repeat containing FBXW7, with COAD containing many unique MutFams including genes EGFR, NRAS and relaxin receptor 2 (TSHR). (A) full list of MutFams in given in Supplementary Table 2. BLCA Bladder cancer; BRCA Breast invasive carcinoma; COAD Colon adenocarcinoma; DLBC Lymphoid Neoplasm Diffuse Large B-cell Lymphoma; ESCA Esophageal carcinoma; GBM Glioblastoma multiforme; GLI Gliomas; KIRC Kidney renal clear cell carcinoma; LAML Acute Myeloid Leukemia; LGG Low grade gliomas; LIHC Liver hepatocellular carcinoma; LUAD Lung adenocarcinoma; LUSC Lung squamous cell carcinoma; OV Ovarian serous cystadenocarcinoma; PAAD Pancreatic adenocarcinoma; PRAD Prostate adenocarcinoma; READ Rectum adenocarcinoma; SKCM Skin Cutaneous Melanoma; STAD Stomach adenocarcinoma; THCA Thyroid carcinoma; UCEC Uterine Corpus Endometrial Carcinoma; UCS Uterine Carcinosarcoma; POLY Polymorhisms (neutral mutations).
Figure 3MutFam genes are significantly enriched in CGC genes. The Pfam-based method (Miller) is also enriched in CGC genes and both methods predict distinct drivers.
Figure 4Genes uniquely identified from MutFams are enriched in 16 GO-Slim terms representing five main cellular processes. GO-Slim terms in common with Miller are shown in bold.
Figure 5Mutations in CHEK2 cluster near specific functional sites. CHEK2 is a checkpoint kinase regulating genome integrity by arresting cell-cycle progression following DNA damage. The phosphotransferase domain shown is representative of a MutFam identified in gliomas and glioblastoma “Calcium/calmodulin-dependent protein kinase type II” [PDB ID: 2ycf]. Mutation clusters A and B (red) are within 5Å of both known and predicted functional sites (FunSites). Cluster A (centred on residue 355) is close to the ATP binding pocket, for which Catalytic Site Atlas (CSA) residues are highlighted green. Cluster B (centred around residues 392, 394 and 396) is in the activation loop near the APE motif, involved in kinase activation, which involves transient CHEK2 dimerization and autophosphorylation. Further CSA residues near cluster B are highlighted green. Clustering of cancer mutations near catalytic residues imply a reduction of CHEK2 kinase activity, preventing it from inhibiting cell-cycle progression in response to DNA damage and helping to drive tumourigenesis.
Figure 6Proximity distributions show that 3D clustered cancer mutations are closer to functional sites than unfiltered mutations. For known functional sites (CSA, Ligand or PPI) 3D clustered mutations are closer than unfiltered mutations from either oncogenes or TSGs. For highly conserved residues in FunFam multiple sequence alignments highly enriched in known functional sites (FunSites), clustered mutations are significantly closer than unfiltered oncogene cancer mutations (there were too few distances measured to plot distributions for FunSites/TSGs). All distributions use UniProt neutral as a control.
Summary of protein functions and cancers for the top mutated MutFam genes having some other supporting evidence. Genes in bold were also identified in either CGC or Miller.
| General Function | Genes | Cancers |
|---|---|---|
| Apoptosis |
| DLBC |
| Chromatin SWI/SNF complex |
| GLI, PAAD |
| Chromatin other |
| UCEC |
| Genome integrity | BLCA, BRCA, COAD, DLBC, ESCA, GBM, GLI, KIRC, LAML, LGG, LIHC, LUAD, LUSC, OV, PAAD, PRAD, READ, SKCM, STAD, UCEC, UCS | |
| MAPK signalling | COAD, GLI, LGG, LUAD, READ, SKCM, THCA | |
| Metabolism |
| GLI, LAML, LGG, SKCM, STAD, UCEC |
| NFKB signalling |
| DLBC |
| NOTCH signalling | GLI, LGG, LUAD | |
| PI3K signalling |
| BLCA, BRCA, COAD, ESCA, GBM, GLI, KIRC, LGG, LIHC, LUAD, LUSC, READ, SKCM, UCEC |
| RTK signalling | BLCA, BRCA, COAD, GBM, GLI, LAML, LUAD, OV, SKCM | |
| TGFB signalling | COAD, ESCA, PAAD, READ, SKCM, STAD | |
| TOR signalling |
| LUAD |
| Signalling including RAS | BLCA, BRCA, COAD, KIRC, LAML, LIHC, LUAD, PAAD, READ, SKCM, STAD, THCA, UCEC | |
| Cadherins | BLCA, BRCA, COAD, ESCA, GLI, LAML, LIHC, LUAD, LUSC, OV, PRAD, READ, SKCM, STAD, THCA | |
| Protein homeostasis/ubiquitination |
| BLCA, COAD, KIRC, LUSC, PRAD, READ, SKCM, STAD, UCEC, UCS |
| Splicing |
| BRCA, PAAD |
| Transcription factors | BLCA, BRCA, COAD, DLBC, GBM, GLI, LGG, LIHC, LUSC, READ, SKCM, THCA, UCEC | |
| Other | BLCA, COAD, LIHC, LUAD, READ, SKCM, THCA |