| Literature DB >> 30681831 |
Alexander W Golinski1, Patrick V Holec1, Katelynn M Mischler1, Benjamin J Hackel1.
Abstract
Evolving specific molecular recognition function of proteins requires strategic navigation of a complex mutational landscape. Protein scaffolds aid evolution via a conserved platform on which a modular paratope can be evolved to alter binding specificity. Although numerous protein scaffolds have been discovered, the underlying properties that permit binding evolution remain unknown. We present an algorithm to predict a protein scaffold's ability to evolve novel binding function based upon computationally calculated biophysical parameters. The ability of 17 small proteins to evolve binding functionality across seven discovery campaigns was determined via magnetic activated cell sorting of 1010 yeast-displayed protein variants. Twenty topological and biophysical properties were calculated for 787 small protein scaffolds and reduced into independent components. Regularization deduced which extracted features best predicted binding functionality, providing a 4/6 true positive rate, a 9/11 negative predictive value, and a 4/6 positive predictive value. Model analysis suggests a large, disconnected paratope will permit evolved binding function. Previous protein engineering endeavors have suggested that starting with a highly developable (high producibility, stability, solubility) protein will offer greater mutational tolerance. Our results support this connection between developability and evolvability by demonstrating a relationship between protein production in the soluble fraction of Escherichia coli and the ability to evolve binding function upon mutation. We further explain the necessity for initial developability by observing a decrease in proteolytic stability of protein mutants that possess binding functionality over nonfunctional mutants. Future iterations of protein scaffold discovery and evolution will benefit from a combination of computational prediction and knowledge of initial developability properties.Entities:
Keywords: predictive algorithm; protein evolvability; protein scaffolds
Mesh:
Substances:
Year: 2019 PMID: 30681831 PMCID: PMC6458986 DOI: 10.1021/acscombsci.8b00182
Source DB: PubMed Journal: ACS Comb Sci ISSN: 2156-8944 Impact factor: 3.784
Figure 1Algorithm for protein scaffold discovery. Small proteins deposited in the Protein Data Bank are analyzed for structural, chemical, and predicted stability parameters. Proteins for experimental evaluation are chosen via a proposed model to predict binding performance. Protein scaffold libraries consisting of millions of unique variants are expressed with diversified binding interfaces. Binding function is evaluated against several molecular targets to determine which proteins evolve specific binding variants. The observed binding performance is then used to adjust the predictive model. Iterative evaluation can be performed.
Evaluated Descriptors of Protein Scaffolds
| factor | description | mean ±
SD ( |
|---|---|---|
| contact degree | total number of residue contacts within 8 Å | 920 ± 270 AU |
| contact order | sum of contact sequence separation divided by size and contact degree | 0.38 ± 0.01 AU |
| long-range contact degree | number of residue contacts with sequence separation >12 divided by size | 11.8 ± 3.1 AU |
| paratope contact degree | total number of residue contacts within 8 Å between a paratope and conserved residue | 430 ± 140 AU |
| paratope contact order | sum of paratope contacts sequence separation divided by paratope size and contact degree | 1.2 ± 0.4 AU |
| paratope stiffness | average stiffness of the paratope in an anisotropic network model | –0.28 ± 0.39 AU |
| charged SASA | conserved solvent accessible surface area of D, E, K, R | 980 ± 430 Å2 |
| hydrophobic SASA | conserved solvent accessible surface area of A, F, G, I, L, M, P | 790 ± 340 Å2 |
| polar SASA | conserved solvent accessible surface area of C, H, N, Q, S, T, W, Y | 780 ± 360 Å2 |
| paratope angle | [paratope 1: entire scaffold: paratope 2] angle based upon centers of volume | 110 ± 30° |
| paratope SASA | solvent-exposed surface area of an alanine-scanned paratope region | 780 ± 360 Å2 |
| paratope separation | distance between the center of volumes of the paratopes | 16 ± 6 Å |
| projected paratope area | two-dimensional projected area of the paratope in the orientation of maximum area | 74 ± 25 AU |
| projected paratope perimeter | perimeter of the projected area of the paratope in the orientation of maximum area | 1.2 ± 0.4 AU |
| buried NPSA | amount of buried nonpolar surface area upon folding | 2700 ± 900 Å2 |
| FoldX DDG | mean difference in stability from parental across 50 variants | 17 ± 12 kJ/mol |
| FoldX energy | mean energy of 50 NNK variants using FoldX’s forcefield | 35 ± 25 kJ/mol |
| new SASA | amount of solvent exposed area created when removing unstructured termini | 320 ± 260 Å2 |
| secondary structure percent | percent of residues in an α-helix or β-sheet | 51 ± 12% |
| size | total number of residues in the scaffold | 47 ± 7 AA |
Figure 2Protein scaffold candidates show varying binding performance. (A–Q) The 17 assayed protein scaffolds with conserved region colored gray and variable paratope colored red. (R) 787 protein scaffolds of 30–65 amino acids with two solvent-exposed loops were computationally analyzed for 20 topological and biophysical factors (Table ). The z-score distributions across all scaffolds are depicted by the box plots (box, 25–75th percentile; center bar, median; whiskers, 1.5 × interquartile range). The plotted values for each of the 17 assayed scaffolds indicate a diversity of proteins were assayed. (S) A pooled sample of 1 × 1010 variants across 17 scaffolds was enriched for binding variants in seven campaigns. MACS sorting was performed until seven binding populations were identified toward diverse molecular targets. Positive selection sorts (bold molecular target) were completed after two depletion sorts of the other listed targets. Binding functionality, quantified here as increased relative yield over control beads, was observed in all campaigns. (T) The relative binding performance for each scaffold against each molecular target as determined by the difference in scaffold abundance from the initial population to the binding populations. Scaffold abundance combines unique variants and variant binding strength using exponential dampening of sequence counts. Inset: The initial abundance of each scaffold. Error bars represent standard error (n = 3).
Figure 3Successful protein scaffolds have diverse topologies. The identity, natural function, structure, and sequence of the top performing scaffolds are presented. The top proteins have various amounts and types of secondary structure. Diversified paratope residues are colored red in both the primary sequence and PyMOL rendering of the protein. Strikethroughs in the sequence represent residues present in the solved structure that were removed in our experimental analysis (as unstructured termini).
Figure 4Large disconnected paratopes are associated with increased binding performance. ICA analysis was completed to describe the independent features of protein scaffolds. Elastic net regularization was performed to determine which of the features predicted binding performance. The resulting linear model was composed of two independent components and a constant term yielding a LOO RMSE of 0.06. (A) The LOO prediction of scaffold binding performance obtained a 4/6 true positive rate, a 9/11 negative predictive value, and a precision (positive predictive value) of 4/6. Classification threshold was determined by ability to evolve a strong binding variant. (B) The predictive model is a linear combination of the 20 calculated parameters and a constant term. The coefficients describe which parameters to modify to improve binding performance of a small protein scaffold.
Figure 5Binding variants describe functional amino acid space. (A) The diversity of sequenced variants based upon matched residues per position. NNK distribution was estimated via 5000 random NNK paratope-diversified sequences with a 1/1000 chance of framework mutations (Q30). The Hamming distance was then summarized by 20 bins based upon the number of mismatched residues per paratope size. Error bars represent standard deviation of Hamming distance frequencies across scaffolds (n = 17 for NNK and initial, n = 12 for binding). (B) The change in amino acid frequencies of binding variants relative to the initial library for all paratope sites across all scaffolds.
Figure 6Limited protein producibility highlights the importance of scaffold developability. Each scaffold is classified by the ability to develop a strong binder (abundance > 1% in at least one campaign) and the parental protein producibility (ability to produce in T7 E. coli in detectable soluble yields). If applicable, the producibility of scaffold variants are shown as no. produced/no. attempted.
Figure 7Proteolytic stability assay identifies stability requirement for binding. (A) Protein scaffold variants were exposed to various levels of proteinase K and sorted based on degree of cleavage on the surface of yeast. The slope of the protease resistance (i.e., collection bin) versus protease concentration is correlated to protein stability. (B) The proteolytic stability of the parental scaffold is correlated to the binding performance of the scaffold. (Note: n.d. for Scaffold K.) (C) Violin plot comparing stabilities of naïve variants and binding variants. A Wilcoxon one-tailed signed rank test indicates that binding variants are less stable than naïve variants (p = 0.034).