| Literature DB >> 30065750 |
Li Chen1, Yanyan Miao1, Mengni Liu1, Yanru Zeng1, Zijun Gao1, Di Peng1, Bosu Hu1, Xu Li2, Yueyuan Zheng1, Yu Xue3, Zhixiang Zuo1, Yubin Xie1, Jian Ren1.
Abstract
Large-scale tumor genome sequencing projects have revealed a complex landscape of genomic mutations in multiple cancer types. A major goal of these projects is to characterize somatic mutations and discover cancer drivers, thereby providing important clues to uncover diagnostic or therapeutic targets for clinical treatment. However, distinguishing only a few somatic mutations from the majority of passenger mutations is still a major challenge facing the biological community. Fortunately, combining other functional features with mutations to predict cancer driver genes is an effective approach to solve the above problem. Protein lysine modifications are an important functional feature that regulates the development of cancer. Therefore, in this work, we have systematically analyzed somatic mutations on seven protein lysine modifications and identified several important drivers that are responsible for tumorigenesis. From published literature, we first collected more than 100,000 lysine modification sites for analysis. Another 1 million non-synonymous single nucleotide variants (SNVs) were then downloaded from TCGA and mapped to our collected lysine modification sites. To identify driver proteins that significantly altered lysine modifications, we further developed a hierarchical Bayesian model and applied the Markov Chain Monte Carlo (MCMC) method for testing. Strikingly, the coding sequences of 473 proteins were found to carry a higher mutation rate in lysine modification sites compared to other background regions. Hypergeometric tests also revealed that these gene products were enriched in known cancer drivers. Functional analysis suggested that mutations within the lysine modification regions possessed higher evolutionary conservation and deleteriousness. Furthermore, pathway enrichment showed that mutations on lysine modification sites mainly affected cancer related processes, such as cell cycle and RNA transport. Moreover, clinical studies also suggested that the driver proteins were significantly associated with patient survival, implying an opportunity to use lysine modifications as molecular markers in cancer diagnosis or treatment. By searching within protein-protein interaction networks using a random walk with restart (RWR) algorithm, we further identified a series of potential treatment agents and therapeutic targets for cancer related to lysine modifications. Collectively, this study reveals the functional importance of lysine modifications in cancer development and may benefit the discovery of novel mechanisms for cancer treatment.Entities:
Keywords: cancer; clinical analysis; lysine modifications; pathway and network analysis; somatic mutations
Year: 2018 PMID: 30065750 PMCID: PMC6056651 DOI: 10.3389/fgene.2018.00254
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1(A) The number of proteins with lysine modifications collected from published literatures. (B) The number of lysine modification sites collected in this study. (C) Distribution of cancer samples and somatic mutations collected across 12 cancer types. (D) The count of mutated PTM sites across 12 cancer types. (E) The count of mutated PTM motifs across 12 cancer types. The proportion of mutated lysine modification motifs are shown above the bar plot.
Figure 2An overview of the analysis model. Lysine modification sites were collected from published literatures. Somatic mutations were downloaded from TCGA, ICGC, and COSMIC. A hierarchical Bayesian model was then constructed to identify proteins with mutations that were significantly enriched in PTM regions. Downstream analysis was also performed to reveal the mechanism of lysine modification-related mutations in cancers.
Figure 3(A) The heatmap shows the number of significantly mutated lysine modification-related proteins across 7 modification types in 12 cancers. (B) The 25 driver proteins that mutated in more than one cancer type are shown in the Circos plot. The width of the lines that connect mutated proteins to cancer types denotes the log10 value of the fold change between modification regions and background regions. Different colors represent different cancer types. (C) Oncoprint for lysine modification-related mutations in UCEC. The number of mutations in each patient or protein are visualized in the bar graph.
Figure 4(A) Distribution of lysine modification-related mutations in MACF1 across the top five cancer types. (B) The lysine modification regions and number of flanking modified sites per residue (orange) in MACF1. (C) The number of mutations per residue in MACF1. The domain organization of MACF1 is shown below the chart.
Figure 5(A) The box plot shows the differences in mutation rates in the domain regions and disorder regions. (B) The cumulative distribution function of the predicted conservation scores in lysine modification-related mutations and other mutations. A Kolmogorov-Smirnov test was applied to examine their statistical differences. (C) The deleteriousness of lysine modification-related mutations and other mutations. A two-tailed population test was applied to evaluate the differences. (D) The subcellular localization of the driver proteins that carried a high rate of lysine modification-related mutations.
Figure 6(A) The enriched GO terms and KEGG pathways obtained from the identified lysine modification-related driver proteins. (B) The result of Reactome pathways analysis on the predicted driver proteins. (C) Kaplan–Meier plots comparing the overall survival rates between patients with lysine modification-related mutations and patients without mutations in liver cancer and (D) thymic carcinoma.
Figure 7(A) Enrichment Map showing the annotated pathways in the whole network. Nodes represent a specific pathway, and edges connect pathways with common genes. (B) The RWR analysis result for 7 types of lysine modifications. The identified driver proteins were taken as initial seeds in the RWR process. The predicted targets were labeled in green, the known cancer driver genes were labeled with red circles, and enriched pathways were labeled with a colored shading.