| Literature DB >> 28945216 |
A Gress1,2, V Ramensky3,4, O V Kalinina1.
Abstract
Next-generation sequencing enables simultaneous analysis of hundreds of human genomes associated with a particular phenotype, for example, a disease. These genomes naturally contain a lot of sequence variation that ranges from single-nucleotide variants (SNVs) to large-scale structural rearrangements. In order to establish a functional connection between genotype and disease-associated phenotypes, one needs to distinguish disease drivers from neutral passenger variants. Functional annotation based on experimental assays is feasible only for a limited number of candidate mutations. Thus alternative computational tools are needed. A possible approach to annotating mutations functionally is to consider their spatial location relative to functionally relevant sites in three-dimensional (3D) structures of the harboring proteins. This is impeded by the lack of available protein 3D structures. Complementing experimentally resolved structures with reliable computational models is an attractive alternative. We developed a structure-based approach to characterizing comprehensive sets of non-synonymous single-nucleotide variants (nsSNVs): associated with cancer, non-cancer diseases and putatively functionally neutral. We searched experimentally resolved protein 3D structures for potential homology-modeling templates for proteins harboring corresponding mutations. We found such templates for all proteins with disease-associated nsSNVs, and 51 and 66% of proteins carrying common polymorphisms and annotated benign variants. Many mutations caused by nsSNVs can be found in protein-protein, protein-nucleic acid or protein-ligand complexes. Correction for the number of available templates per protein reveals that protein-protein interaction interfaces are not enriched in either cancer nsSNVs, or nsSNVs associated with non-cancer diseases. Whereas cancer-associated mutations are enriched in DNA-binding proteins, they are rarely located directly in DNA-interacting interfaces. In contrast, mutations associated with non-cancer diseases are in general rare in DNA-binding proteins, but enriched in DNA-interacting interfaces in these proteins. All disease-associated nsSNVs are overrepresented in ligand-binding pockets, and nsSNVs associated with non-cancer diseases are additionally enriched in protein core, where they probably affect overall protein stability.Entities:
Year: 2017 PMID: 28945216 PMCID: PMC5623905 DOI: 10.1038/oncsis.2017.79
Source DB: PubMed Journal: Oncogenesis ISSN: 2157-9024 Impact factor: 7.485
Data sets in this study
|
|
|
|
|
|
|---|---|---|---|---|
| Cancer germline | ClinVar, Uniprot | 452 (86) | 360, 79.6% (58, 67.4%) | 450, 99.6% (86, 100%) |
| Cancer germline randomized | n/a | 443±3 (86) | 268±11, 60.5% (44±2, 51.2%) | 318±12, 71.8% (86, 100%) |
| Cancer somatic | ClinVar, Uniprot, COSMIC | 3673 (371) | 2952, 80.4% (246, 66.3%) | 3660, 99.6% (371, 100%) |
| Cancer somatic randomized | n/a | 3572±8 (371) | 1983±17, 55.5% (192±4, 51.7%) | 2398±21, 67.1% (371, 100%) |
| Non-cancer diseases | ClinVar, Uniprot | 14983 (1586) | 9678, 64.6% (795, 50.1%) | 14386, 96.0% (1586, 100%) |
| Non-cancer diseases randomized | n/a | 14431±23 (1586) | 7800±35, 54.0% (748±8, 47.2%) | 10982±54, 76.1% (1586, 100%) |
| Common | ExAC | 27326 (10261) | 2048, 7.5% (1038, 10.1%) | 6048, 22.1% (5251, 51.1%) |
| Common randomized | n/a | 27214±11 (10261) | 2425±19, 8.9% (1260±19, 12.3%) | 7091±57, 26.1% (5251, 51.1%) |
| Benign | ClinVar | 5186 (962) | 658, 12.7% (208, 21.6%) | 1166, 22.5% (634, 65.9%) |
| Benign randomized | n/a | 5134±8 (962) | 765±21, 14.9% (239±4, 24.8%) | 1387±18, 27.0% (634, 65.9%) |
Figure 1Distance between residues corresponding to nsSNVs and the nearest interaction partner (log scale). Biological data sets are shown in a darker shade. The fraction of mapped nsSNVs, for which a template with a co-resolved corresponding interaction partner is provided below boxes representing distribution of distances to protein, ligand and DNA interaction partners for each biological data set. For randomized data sets, all 10 replicas are used to create the plots. (a) Distances to the nearest protein chain. (b) Distances to the nearest ligand. (c) Distances to the nearest DNA chain.
Figure 2Chemical difference between wild-type and mutated residues. Gray bars indicate biological data sets, light-gray bars indicate randomized data sets. Chemical distance is calculated as Euclidean distances between the end points of the vectors representing five most important numerical descriptors of physical and chemical properties[80] of the wild-type and mutant amino acids.
Figure 3Spatial distribution of nsSNVs in the analyzed data sets. For randomized data sets, mean values over 10 replicas are used. (a) For templates with ⩾35% sequence identity. (b) For templates with ⩾90% sequence identity.
Figure 4Protein complexes with nsSNVs in multiple subunits. (a) Mitochondrial respiratory complex II (mapped onto a homologous complex from porcine heart, PDB id 1ZOY) and the corresponding sub-network (see text). FAD-binding protein is shown in green, mutations therein in pink; iron–sulfur protein is shown in cyan, mutations therein in orange; large cytochrome binding protein is shown in magenta, mutations therein in purple; small cytochrome binding protein is shown in yellow, mutation therein in limegreen. In the sub-network, nodes correspond to individual proteins, edges depict interactions between them. (b) Sub-network corresponding to complexes of CDK6 with its inhibitors CDKN2A and CDKN2C. Stoichiometry of the complexes is not accounted for, and nodes with a single loop edge correspond to associations of multiple identical subunits. (c) Sub-network corresponding to NRas, KRas and HRas and their downstream kinase RAF1 and activity factors SOS1 and PLCE1. (d) PIK3CA-PIK3R1 complex with mutations corresponding to cancer-associated somatic nsSNVs (top) and to nsSNVs associated with non-cancer diseases (bottom), PDB id 4L1B and the PIK3CA-PIK3R1 sub-network. PIK3CA subunit is shown in green, mutations therein in magenta and purple. PIK3R1 subunit is shown in cyan, mutations therein in orange and red.
Figure 5Contacts and distance distributions for oncogenes and tumor-suppressor genes (TSG). (a) Distribution of nsSNVs into structural classes. (b–d) Distances to the nearest interaction partners: (b) protein chain, (c) ligand, (d) DNA chain.
Top 20 ReactomeDB pathways identified in differential analysis of disease-associated data sets compared with the set of common variants
|
|
|
|
|---|---|---|
| Regulation of TP53 activity through phosphorylation | PIP3 activates AKT signaling | Neutrophil degranulation |
| Ub-specific processing proteases | Oxidative stress-induced senescence | Intrinsic pathway of fibrin clot formation |
| TP53 regulates transcription of DNA repair genes | Factors involved in megakaryocyte development and platelet production | Glycosphingolipid metabolism |
| G2/M DNA damage checkpoint | Oncogene-induced senescence | Gap junction assembly |
| Recruitment and ATM-mediated phosphorylation of repair and signaling proteins at DNA double strand breaks | Ub-specific processing proteases | Urea cycle |
| Factors involved in megakaryocyte development and platelet production | Ovarian tumor domain proteases | Platelet degranulation |
| PIP3 activates AKT signaling | Regulation of TP53 degradation | Oligomerization of connexins into connexons |
| Stabilization of p53 | Regulation of TP53 activity through Phosphorylation | Transport of connexins along the secretory pathway |
| Regulation of TP53 activity through methylation | Pre-NOTCH transcription and translation | Galactose catabolism |
| Regulation of TP53 degradation | Recruitment and ATM-mediated phosphorylation of repair and signaling proteins at DNA double strand breaks | Transport of gamma-carboxylated protein precursors from the endoplasmic reticulum to the Golgi apparatus |
| Formation of senescence-associated heterochromatin foci (SAHF) | Association of TriC/CCT with target proteins during biosynthesis | Removal of aminoterminal propeptides from gamma-carboxylated proteins |
| Oncogene induced senescence | TP53 regulates transcription of DNA repair Genes | Gamma-carboxylation of protein precursors |
| Oxidative stress-induced senescence | G2/M DNA damage checkpoint | Extrinsic pathway of fibrin clot formation |
| DNA damage/telomere stress-induced Senescence | TP53 regulates metabolic genes | Common pathway of fibrin clot formation |
| SUMOylation of transcription factors | Regulation of TP53 activity through methylation | Striated muscle contraction |
| Activation of NOXA and translocation to mitochondria | Regulation of TP53 activity through acetylation | RAF/MAP kinase cascade |
| Regulation of TP53 activity through acetylation | TP53 regulates transcription of genes involved in Cytochrome C release | Regulation of gene expression in beta cells |
| Transcriptional activation of cell cycle inhibitor p21 | Stabilization of p53 | Phenylalanine and tyrosine catabolism |
| PI5P regulates TP53 acetylation | Regulation of TP53 activity through association with co-factors | Signaling by BRAF and RAF fusions |
| TP53 regulates transcription of additional cell cycle genes whose exact role in the p53 pathway remain uncertain | DNA damage/telomere stress-induced senescence | Signaling by RAS mutants |
Differences of the combined scores (see Materials and methods) for disease-associated nsSNVs and common variants are shown in parentheses.
GO-term enrichment analysis: top 20 terms in the 'Process' category
|
|
|
|
|---|---|---|
| Positive regulation of transcription, DNA-templated | Negative regulation of cell proliferation | Positive regulation of transcription, DNA-templated |
| Negative regulation of cell proliferation | Positive regulation of transcription, DNA-templated | Cell–cell signaling |
| Negative regulation of transcription from RNA polymerase II promoter | Negative regulation of transcription from RNA polymerase II promoter | Response to drug |
| Negative regulation of apoptotic process | Negative regulation of apoptotic process | Blood coagulation |
| Regulation of transcription, DNA-templated | Positive regulation of transcription from RNA polymerase II promoter | Positive regulation of transcription from RNA polymerase II promoter |
| Cell proliferation | Regulation of transcription, DNA-templated | Transport |
| Positive regulation of gene expression | Ras protein signal transduction | Positive regulation of gene expression |
| Positive regulation of transcription from RNA polymerase II promoter | Positive regulation of gene expression | Visual perception |
| Regulation of signal transduction by p53 class mediator | Negative regulation of transcription, DNA-templated | Signal transduction |
| Cellular response to DNA damage stimulus | Cell proliferation | Negative regulation of neuron apoptotic process |
| DNA damage response, signal transduction by p53 class mediator resulting in transcription of p21 class mediator | Negative regulation of cell growth | Negative regulation of apoptotic process |
| Intrinsic apoptotic signaling pathway in response to DNA damage by p53 class mediator | Cell cycle arrest | Nervous system development |
| Ras protein signal transduction | Viral process | Negative regulation of transcription from RNA polymerase II promoter |
| Regulation of apoptotic process | Cellular response to drug | Liver development |
| Cellular response to drug | Cellular response to DNA damage stimulus | Sensory perception of sound |
| Cell differentiation | Positive regulation of apoptotic process | ER to Golgi vesicle-mediated transport |
| Response to X-ray | Replicative senescence | Positive regulation of cell proliferation |
| Negative regulation of transcription, DNA-templated | Cell differentiation | Response to hypoxia |
| Cell cycle arrest | Regulation of apoptotic process | Response to estradiol |
| Negative regulation of cell growth | Regulation of signal transduction by p53 class mediator | Transcription, DNA-templated |
Differences of the combined scores (see Materials and methods) for disease-associated nsSNVs and common variants are shown in parentheses.