| Literature DB >> 26322316 |
Hui-Chun Lu1, Sun Sook Chung2, Arianna Fornili3, Franca Fraternali1.
Abstract
Integration of protein structural information with human genetic variation and pathogenic mutations is essential to understand molecular mechanisms associated with the effects of polymorphisms on protein interactions and cellular processes. We investigate occurrences of non-synonymous SNPs in ordered and disordered protein regions by systematic mapping of common variants and disease-related SNPs onto these regions. We show that common variants accumulate in disordered regions; conversely pathogenic variants are significantly depleted in disordered regions. These different occurrences of pathogenic and common SNPs can be attributed to a negative selection on random mutations in structurally highly constrained regions. New approaches in the study of quantitative effects of pathogenic-related mutations should effectively account for all the possible contexts and relative functional constraints in which the sequence variation occurs.Entities:
Keywords: disease-related mutations; non-synonymous SNPs; order-disorder propensity; protein disorder; protein flexibility
Year: 2015 PMID: 26322316 PMCID: PMC4532925 DOI: 10.3389/fmolb.2015.00047
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
Figure 1Analyses of non-synonymous single nucleotide polymorphisms (SNPs) in intra-domain ordered regions, intra-domain disordered regions and inter-domain disordered regions. (A) Scheme of protein regions. A protein contains (intra−)domain regions (dashed boundary line) and inter-domain regions. Domain regions contain ordered regions (INTRA-Dom ORs; light-green squares) and disordered regions (INTRA-Dom DRs; dark green zigzag line). Inter-domain regions are predominantly disordered (INTER-Dom DR; red zigzag line). (B) SNP frequency analysis. The propensity of SNPs P(SNP) to occur in each region was calculated using Equation 1. Average propensity values are reported as relative entropies log(P(SNP)). Error bars were estimated using bootstrap re-sampling with 10,000 replicates. Stars denote the alpha levels of the test statistics (*p < 0.05; ***p < 0.001). (C) Number of SNPs mapped onto different protein regions. The number of nsSNPs in each class and the average lengths of the protein regions are listed together with the standard error of the mean (SEM). The column “N(proteins)” contains the number of proteins selected for the study of a SNP class, while column “N(SNPs)” reports the total number of SNPs mapped onto the reference proteins.
Figure 2Example of changes in disordered regions (DRs) conferred by SNPs in distant ordered regions. (A) Disorder prediction by DISOPRED2 of wild type (WT) and mutated sequence segments (600–615) of BRAF. Each column is labeled with the specific SNP used for DR prediction and contains the confidence scores of the DISOPRED2 prediction involving raw scores of disorder probability and their filtered scores with parentheses. The residues in DRs are annotated with (*) asterisks and colored in blue. SNPs within the sequence segment 600–615 are colored in yellow. (B) Plot of the DISOPRED2 filtered confidence scores of the BRAF WT and mutated sequences. The predicted behavior of V600E (red line) is distinct from that of the BRAF WT sequence (thick black line). The horizontal blue line indicates 5% of filter threshold of the method. The inset shows the 3D structure of the BRAF kinase domain (4MNE_B, cyan cartoon), the location of residue V600 (yellow licorice) and the predicted disordered positions (light green spheres).