| Literature DB >> 31515488 |
Robert Fragoza1,2, Jishnu Das3,4, Shayne D Wierbowski1,2, Jin Liang1,2, Tina N Tran5,6, Siqi Liang1,2, Juan F Beltran1,2, Christen A Rivera-Erick1,2, Kaixiong Ye1, Ting-Yi Wang1,2, Li Yao1,2, Matthew Mort7, Peter D Stenson7, David N Cooper7, Xiaomu Wei1, Alon Keinan1, John C Schimenti5, Andrew G Clark1,6, Haiyuan Yu8,9.
Abstract
Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual's genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.Entities:
Mesh:
Year: 2019 PMID: 31515488 PMCID: PMC6742646 DOI: 10.1038/s41467-019-11959-3
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1A pipeline for surveying the impact of 2009 SNVs on protein–protein interactions. a Phenotypic consequences of coding variants in human genotypes can be interpreted as products of protein–protein interaction perturbations in the interactome. b Over half of all unique missense variants in ExAC are singletons. To avoid oversampling very rare variants from ExAC, 1676 ExAC variants were selected across a wide range of allele frequencies. 204 disease-associated mutations listed in HGMD and 162 cancer somatic mutations from COSMIC were also examined. c Pipeline for testing the functional impact of 2009 SNVs on protein interactions and stability impact of 278 population variants by dual-fluorescence screen
Fig. 2Assessing the impact of disruptive alleles on protein function. a Fraction of protein pairs recovered by PCA for disrupted and intact interactions in comparison to positive and random reference sets (PRS and RRS). P values by one-tailed Z-test between disrupted and intact interactions. P values by two-tailed Z-test for all other comparisons. b Fraction of disruptive variants in ExAC (blue) across four allele frequency ranges (i) <0.1%, (ii) 0.1 – 1.0%, (iii) 1.0 – 10%, and (iv) >100%. P value by chi-square test. Fraction of disruptive somatic mutations in COSMIC (purple) in known cancer-affiliated genes or other genes and fraction of disruptive germline disease-associated genes from HGMD (red) are also shown. P values by one-tailed Z-test. c Reported number of functional missense variants per individual genome varies extensively across different studies. d ExAC variants tested against ≥ 2 interactions further partitioned into three disruption categories. Distribution of e allele frequency, f Grantham scores, and g PolyPhen-2 scores across three disruption categories. Error bars in a and b indicate + SE of proportion. Thick black bars in g are the interquartile range, white dots display the median, and extended thin black lines represent 95% confidence intervals. P values in e, and g by one-tailed U-test. P values in f by two-tailed U-test. See also Supplementary Tables 1–3, Supplementary Data 2–4, and Supplementary Fig. 1
Fig. 3Disruptive population variants seldom result in unstable protein expression. a Western blots for representative wild-type:variant pairs across three stability categories detected using α-GFP. α-GAPDH was used as a loading control. b DUAL-FLOU protein stability scores for 278 wild-type:variant pairs. Dashed blue line represents 1:1 ratio between stability scores for mutant and wild-type. c Fraction of variants residing in LoF-intolerant genes (pLI ≥ 0.9) for stable (n = 199), moderately stable (n = 53), and unstable (n = 10) protein stability categories. d Ratio of mutant-to-wild-type stability scores corresponding to non-disruptive (n = 103), partially disruptive (n = 45), and null-like variants (n = 12). e Diagram of interactions disrupted by null-like AKR7A2_A142T variant. Cellular expression levels of V5-tagged AKR7A2 was measured by Western blot using α-V5. α-γ-Tubulin was used as a loading control. * indicates 37 kDa marker. f In vitro specific activities of purified recombinant AKR7A2 wild-type and A142T using succinic semialdehyde substrate. Fitted curves (dashed lines) are shown for wild-type and A142T. P value by one-tailed t-test. Error bars indicate ± SE of mean at eight different substrate concentrations. Error bars in c and d indicate +SE of proportion. P values in c and d by one-tailed U-test. See also Supplementary Figs. 2, 6, and 7a
Fig. 4Disruptive variants occur in important gene groups and at conserved genomic sites. a Fraction of disruptive variants that occur in non-constrained (n = 1349), disease-associated (n = 423), cancer-associated (n = 78), essential (n = 223), or LoF-intolerant genes (n = 270). b Fraction of interactions disrupted by variants that occur on interface residues or interface domains (n = 307) in comparison to interactions disrupted by variants that occur away from interaction interfaces (n = 41). c Distribution of Jensen–Shannon Divergence scores for amino acid residues at sites corresponding to disruptive and non-disruptive variants. Larger scores indicate more conserved sites. d Fraction of disruptive variants found in genomic regions where Fay and Wu’s H is significant measured across four different population groups and across overall population. e Fraction of mutations pairs that lead to the same disease for germline mutations that share two or more disrupted interactions (n = 42), share one or more disrupted interactions (n = 271), or do not share disrupted interactions (n = 599). f Schematic of interaction disruption profiles for SMAD4 disease-associated mutations E330K, G352R, and N13S. Corresponding disease names are labeled. g Co-crystal structure of SMAD4–SMAD3 interacting proteins (PDB ID: 1U7F). Disease-associated mutations are labeled. Structure covers SMAD4 residues 315-546 and therefore N13S mutation is not represented on this structure. Error bars in a, b, d, and e indicate +SE of proportion. P values in a, b, d, and e by one-tailed Z-test. P value in c by one-tailed U-test. See also Supplementary Fig. 4
Fig. 5Prioritizing candidate disease-associated mutations through shared disruption profiles. a Schematic of interaction disruption profiles for disease-associated mutation D32N and rare variants T152I and T149M. Stable expression of FLAG-tagged wild-type and mutant PSPH proteins was validated by Western blot using α-FLAG. α-γ-Tubulin was used as a loading control. A brief diagram of PSPH phosphatase activity is shown. * indicates 37 kDa marker b Enzymatic activity of purified recombinant wild-type and mutant PSPH proteins using phosphoserine substrate was measured in vitro using a malachite green assay performed in triplicate. Enzymatic activities for PSPH mutants are shown in proportion to wild-type activity. Error bars indicate +SE of mean. P value by one-tailed t-test. c Schematic of interaction disruption profiles for SEPT12 rare variant G169E and disease-associated mutation D197N. d Homology model of SEPT12–SEPT1 interaction. PDB ID 5CYO chains A and B used as template. Disruptive mutations on interaction interface are labeled. e Disruption of SEPT12 interaction with SEPT1 by G169E and D197N was validated by co-IP. SEPT12 bait proteins were detected using α-FLAG. SEPT1 prey was detected using α-HA. α-GAPDH was used as a loading control. f Fertility tests of 2–6-month-old WT (n = 2 males, avg = 8.9 ± 0.51; n = 2 females, avg = 8.6 ± 0.61) and Sept12 (n = 3 males, avg = 4.0 ± 1.3; n = 2 females, avg = 9.2 ± 0.57) mice bred to age-matched controls. Litter sizes were recorded. Green = males. Blue = females. All comparisons are not significant except for male WT vs. male Sept12 (P = 0.00052; by two-tailed t-test). g Assessment of sperm motility of WT (n = 2, sperm = 166), Sept12 (n = 4, sperm = 484), and Sept12 (n = 3, sperm = 416) mice. See also Supplementary Figs. 7b, 8, and 9