| Literature DB >> 32722039 |
Eric T C Wong1, Victor So1, Mike Guron1, Erich R Kuechler1, Nawar Malhis1, Jennifer M Bui1, Jörg Gsponer1.
Abstract
Because proteins are fundamental to most biological processes, many genetic diseases can be traced back to single nucleotide variants (SNVs) that cause changes in protein sequences. However, not all SNVs that result in amino acid substitutions cause disease as each residue is under different structural and functional constraints. Influential studies have shown that protein-protein interaction interfaces are enriched in disease-associated SNVs and depleted in SNVs that are common in the general population. These studies focus primarily on folded (globular) protein domains and overlook the prevalent class of protein interactions mediated by intrinsically disordered regions (IDRs). Therefore, we investigated the enrichment patterns of missense mutation-causing SNVs that are associated with disease and cancer, as well as those present in the healthy population, in structures of IDR-mediated interactions with comparisons to classical globular interactions. When comparing the different categories of interaction interfaces, division of the interface regions into solvent-exposed rim residues and buried core residues reveal distinctive enrichment patterns for the various types of missense mutations. Most notably, we demonstrate a strong enrichment at the interface core of interacting IDRs in disease mutations and its depletion in neutral ones, which supports the view that the disruption of IDR interactions is a mechanism underlying many diseases. Intriguingly, we also found an asymmetry across the IDR interaction interface in the enrichment of certain missense mutation types, which may hint at an increased variant tolerance and urges further investigations of IDR interactions.Entities:
Keywords: human disease; interface core and rim; intrinsically disordered proteins; protein–protein interactions; single nucleotide variants
Year: 2020 PMID: 32722039 PMCID: PMC7463635 DOI: 10.3390/biom10081097
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Structural regions analyzed in this study. The structural regions were defined based on solvent-accessible surface areas measured from protein complex structures. Residues with changes in relative solvent accessible surface area (rASA; see Methods) between the bound and unbound conformations were defined as core residues (red) if rASAs are smaller than 0.25 and rim residues (blue) if rASAs are greater than 0.25 in the bound structures. Buried residues are non-interface residues with rASAs smaller than 0.25 in the unbound structures, and the remainder are surface residues (gray). Because the full-length protein often contains regions without structural data coverage, these structurally undefined sequences were classified as external regions in our analyses.
Figure 2Odds ratios of SwissVar single nucleotide variants (SNVs). An odds ratio (OR) is calculated for each protein region using all residues in the dataset as the reference distribution. The bar graph plots the ORs (Y-axis) of each protein category and protein region (X-axis). Each OR is the odds of mutation in the specific region divided by the odds of the full-length parent proteins. The Y-axis is centered at one, and ORs > 1 show enrichment while ORs < 1 show depletion. Structural regions are color-coded (see Figure 1). Statistical significance is denoted by asterisks: * p-value < 0.05, ** p-value < 0.01, *** p-value < 0.001. ORs and p-values can be found in Supplementary Table S2.
Figure 3Odds ratios of COSMIC SNVs. (A) A bar graph of odds ratios of all COSMIC SNVs. The odds ratios of the subsets of proteins that were categorized as (B) tumor suppressors and (C) oncoproteins, respectively. See Figure 2 for details. p-values for all odds ratios can be found in Supplementary Table S2.
Figure 4Odds ratios of gnomAD SNVs. (A) Odds ratios are calculated using gnomAD SNVs of frequencies between 0.1 to 10−6. (B) Odds ratios are calculated using gnomAD SNVs of frequencies between 5 * 10−6 to 10−6, i.e., rare SNVs. (C) Odds ratios are calculated using gnomAD SNVs of frequencies between 0.1 to 0.001, i.e., high-frequency SNVs. See Figure 2 for details.