| Literature DB >> 33779753 |
Matteo Cagiada1, Kristoffer E Johansson1, Audrone Valanciute1, Sofie V Nielsen1, Rasmus Hartmann-Petersen1, Jun J Yang2,3, Douglas M Fowler4,5, Amelie Stein1, Kresten Lindorff-Larsen1.
Abstract
Understanding and predicting how amino acid substitutions affect proteins are keys to our basic understanding of protein function and evolution. Amino acid changes may affect protein function in a number of ways including direct perturbations of activity or indirect effects on protein folding and stability. We have analyzed 6,749 experimentally determined variant effects from multiplexed assays on abundance and activity in two proteins (NUDT15 and PTEN) to quantify these effects and find that a third of the variants cause loss of function, and about half of loss-of-function variants also have low cellular abundance. We analyze the structural and mechanistic origins of loss of function and use the experimental data to find residues important for enzymatic activity. We performed computational analyses of protein stability and evolutionary conservation and show how we may predict positions where variants cause loss of activity or abundance. In this way, our results link thermodynamic stability and evolutionary conservation to experimental studies of different properties of protein fitness landscapes.Entities:
Keywords: deep mutational scanning; disease variants; genomics; multiplexed assays of variant effects; protein stability; protein structure–function; protein variants
Year: 2021 PMID: 33779753 PMCID: PMC8321532 DOI: 10.1093/molbev/msab095
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Overview of the NUDT15 and PTEN multiplexed data analyzed in this work. (A) and (B) show 2D histograms that combine the data from the activity-based MAVE on the y axis with the results from the VAMP-seq experiment on the x axis. Variants are categorized based on the region of the 2D histogram (dashed lines) they belong to. The fractions of variants falling in each of the four quadrants are indicated, with errors of the mean estimated by bootstrapping using the uncertainties of the experimental scores. The two green points indicate the wild type. Arrows on the axes indicate directions of greater abundance or activity; for detailed definitions of the scores and their uncertainties, we refer the reader to the original publications (Matreyek et al. 2018; Mighell et al. 2018; Suiter et al. 2020). Panels (C) and (D) show a per-position consensus category (CC) colored onto the structure of the proteins (PDB entry 5LPG for NUDT15 and 1D5R for PTEN). Panels (E) and (F) show the positional color categories together with the secondary structure (ST) and solvent accessibility (SA). The four classes of variants/positions are represented by a color: “WT-like” (green), “Low activity, high abundance” (blue), “Low abundance, high activity” (yellow), and “Total loss” (red).
Fig. 2.Examples of “low-activity, high-abundance” positions. (A) Residues in PTEN in the low-activity, high-abundance category (blue) include residues in and surrounding the catalytic phosphatase site including some that directly interact with the substrate (here mimicked by the inhibitor tartrate; Lee et al. 1999). (B) Other residues that are more distant to the active site also fall in this category, and variants in this region could perturb the integrity of the active site. (C and D) Examples of functionally important residues in NUDT15 that are close to, but outside of the active site. In particular, we identified four conserved residues (Asn111, Asn117, Gln44, Arg44) that appear to connect via a hydrogen bond network, and whose perturbation could affect the hydrolysis of the thiopurines.
Fig. 3.Histograms of the two computational scores ( and ) in NUDT15 and PTEN. aims to capture effects purely on the thermodynamic stability, with high values indicating destablized variants. captures evolutionary conservation, as calculated by a model that takes both site and pairwise coevolution into account, and with high values indicating nonconservative substitutions. Thus, for both and positive values indicate detrimental substitutions, whereas in the experiments low values indicate substitutions that cause loss of activity or abundance. For both proteins, we split the histograms up according to the four categories of variants determined from the experiments, as indicated by the axes with high and low experimental scores for abundance and activity. Thus, for example, the two green histograms for NUDT15 indicate the distributions of and values for those variants that are classified as stable and active by the MAVEs, and indeed it is clear that most of these variants have scores that are below the cutoff (red dashed lines). In addition to the colored histograms, we also show the full histogram of all analyzed variants (gray) to ease comparison between the subsets and the full set of variants.