| Literature DB >> 26580303 |
Hafumi Nishi1, Junichi Nakata2, Kengo Kinoshita1,2,3.
Abstract
Recent advances in DNA sequencing techniques have identified rare single-nucleotide variants with less than 1% minor allele frequency. Despite the growing interest and physiological importance of rare variants in genome sciences, less attention has been paid to the allele frequency of variants in protein sciences. To elucidate the characteristics of genetic variants on protein interaction sites, from the viewpoints of the allele frequency and the structural position of variants, we mapped about 20,000 human SNVs onto protein complexes. We found that variants are less abundant in protein interfaces, and specifically the core regions of interfaces. The tendency to "avoid" the interfacial core is stronger among common variants than rare variants. As amino acid substitutions, the trend of mutating amino acids among rare variants is consistent in different interfacial regions, reflecting the fact that rare variants result from random mutations in DNA sequences, whereas amino acid changes of common variants vary between the interfacial core and rim regions, possibly due to functional constraints on proteins. This study illustrated how the allele frequency of variants relates to the protein structural regions and the functional sites in general and will lead to deeper understanding of the potential deleteriousness of rare variants at the structural level. Exceptional cases of the observed trends will shed light on the limitations of structural approaches to evaluate the functional impacts of variants. Published by Wiley-Blackwell.Entities:
Keywords: 3D structure; nonsynonymous mutations; protein complex; protein-protein interface; rare variants
Mesh:
Substances:
Year: 2015 PMID: 26580303 PMCID: PMC4815344 DOI: 10.1002/pro.2845
Source DB: PubMed Journal: Protein Sci ISSN: 0961-8368 Impact factor: 6.725
Distributions of Single‐Nucleotide Variants on Proteins
| Interface | Non‐interface | Total | |||||
|---|---|---|---|---|---|---|---|
| Core | Support | Rim | Total | Protein Internal | Protein Surface | ||
| All variants | 1550 | 841 | 1813 | 4204 | 7207 | 8894 | 20,305 |
| Rare | 1516 | 828 | 1748 | 4092 | 7015 | 8533 | 19,640 |
| Intermediate | 20 | 8 | 29 | 57 | 109 | 176 | 342 |
| Common | 14 | 5 | 36 | 55 | 83 | 185 | 323 |
| Non‐variant | 27,154 | 17,725 | 26,799 | 71,678 | 146,816 | 128,860 | 347,354 |
Figure 1Relative frequency of each amino acid among rare and nonrare variants. Left: Rare variants; right: nonrare (intermediate and common) variants. The relative frequency of an amino acid is defined as the ratio of its percentages among variant and nonvariant sites. In nonrare variants, Q and F are overlapped at (0,0).
Figure 2An example of common variants on a protein interface core. A: The dimeric structure of epoxide hydrolase (PDB ID: 4j03). B: The variant site (Arg287) on the dimeric interface. The mutating arginine residues are shown as orange stick models. The figure was prepared with PyMol.19