Literature DB >> 30681831

Biophysical Characterization Platform Informs Protein Scaffold Evolvability.

Alexander W Golinski1, Patrick V Holec1, Katelynn M Mischler1, Benjamin J Hackel1.   

Abstract

Evolving specific molecular recognition function of proteins requires strategic navigation of a complex mutational landscape. Protein scaffolds aid evolution via a conserved platform on which a modular paratope can be evolved to alter binding specificity. Although numerous protein scaffolds have been discovered, the underlying properties that permit binding evolution remain unknown. We present an algorithm to predict a protein scaffold's ability to evolve novel binding function based upon computationally calculated biophysical parameters. The ability of 17 small proteins to evolve binding functionality across seven discovery campaigns was determined via magnetic activated cell sorting of 1010 yeast-displayed protein variants. Twenty topological and biophysical properties were calculated for 787 small protein scaffolds and reduced into independent components. Regularization deduced which extracted features best predicted binding functionality, providing a 4/6 true positive rate, a 9/11 negative predictive value, and a 4/6 positive predictive value. Model analysis suggests a large, disconnected paratope will permit evolved binding function. Previous protein engineering endeavors have suggested that starting with a highly developable (high producibility, stability, solubility) protein will offer greater mutational tolerance. Our results support this connection between developability and evolvability by demonstrating a relationship between protein production in the soluble fraction of Escherichia coli and the ability to evolve binding function upon mutation. We further explain the necessity for initial developability by observing a decrease in proteolytic stability of protein mutants that possess binding functionality over nonfunctional mutants. Future iterations of protein scaffold discovery and evolution will benefit from a combination of computational prediction and knowledge of initial developability properties.

Entities:  

Keywords:  predictive algorithm; protein evolvability; protein scaffolds

Mesh:

Substances:

Year:  2019        PMID: 30681831      PMCID: PMC6458986          DOI: 10.1021/acscombsci.8b00182

Source DB:  PubMed          Journal:  ACS Comb Sci        ISSN: 2156-8944            Impact factor:   3.784


Introduction

Proteins have evolved to empower a broad array of functionality. While minimal amino acid mutations can yield dramatic enhancements in functional performance via evolution,[1,2] discovery of completely new function typically requires greater leaps in sequence.[3] Given the relative barrenness and tortuosity of sequence space,[2] efficient strategies are needed to achieve successful de novo discovery. One strategy to facilitate discovery is the use of a protein scaffold[4,5] comprising a conserved framework to provide biophysical robustness and a variable active site to provide diverse function. One particular function, molecular recognition via binding ligands, has ubiquity in natural biology and broad technological utility in targeted molecular therapies[6] and diagnostics.[7] A functional protein ligand scaffold must be able to evolve new, specific binding function upon mutation of the paratope[8] and possess optimal developability properties (e.g., stability, solubility, and expression) for downstream use.[9] To date, numerous protein scaffolds have been engineered to obtain strong affinity toward clinically relevant targets,[10,11] while some have entered clinical trials.[12−15] Protein scaffolds offer novel topologies and differential size, allowing for unique binding interfaces and tunable pharmacokinetic properties.[16,17] The diversity of topologies and physicochemistries of published scaffolds and the paucity of data on unsuccessful scaffolds preclude an understanding of the biophysical features that allow the development of binding functionality. Thus, to advance the understanding of de novo protein discovery and evolution, as well as to advance technological capability for ligand engineering, we sought to develop a platform to elucidate the factors that dictate scaffold performance and to identify new scaffolds. Previously established scaffolds have been discovered based on an evolutionary or mechanically themed hypothesis. The use of antibodies,[6] antibody fragments,[18] and leucine-rich repeats[19] presumed that their natural function for high affinity binding will serve as a starting point for scaffold engineering. Fibronectin type III “monobodies”[20] and designed ankyrin repeat proteins[21] are structurally similar to these immune scaffolds. Lipocalins,[22] three-helix bundle affibodies,[23] fynomers,[24] and others[17] offer unique topologies with native binding ability. Alternatively, multiple scaffolds are chosen for their strong structural stability, including cystine knots[25] and thermophilic affitins[26] and homologues. Similarly, a host of other scaffolds have provided compelling performance in ligand development, while others have been tested without the same level of success.[21] A comparison of potential scaffolds was recently performed, which identified the Gp2 scaffold for its small size, adjacent, solvent-exposed loops with significant surface area, and mutational tolerance.[10] However, a rigorous evaluation of the properties that permit protein scaffold function, now enabled by advances in high-throughput screening and sequencing, has yet to be performed. Herein, we propose an iterative discovery and evaluation platform for new protein scaffolds in which we computationally characterize biophysical properties of scaffold topologies and experimentally evaluate binder evolution (Figure ). Parameter selection techniques are then employed to assess predictive characteristics of evolvable scaffolds. In this Research Article, computationally derived stability and topology parameters were used to identify the first predictive model of protein scaffold function, which can be used to identify future successful protein scaffold candidates. Additionally, experimental characterization of scaffold developability suggests stable and producible proteins yield improved binder evolution to combat a trade-off between stability and new binding function. The findings in the study suggest a combination of developability and biophysical metrics should be used to identify future protein scaffolds.
Figure 1

Algorithm for protein scaffold discovery. Small proteins deposited in the Protein Data Bank are analyzed for structural, chemical, and predicted stability parameters. Proteins for experimental evaluation are chosen via a proposed model to predict binding performance. Protein scaffold libraries consisting of millions of unique variants are expressed with diversified binding interfaces. Binding function is evaluated against several molecular targets to determine which proteins evolve specific binding variants. The observed binding performance is then used to adjust the predictive model. Iterative evaluation can be performed.

Algorithm for protein scaffold discovery. Small proteins deposited in the Protein Data Bank are analyzed for structural, chemical, and predicted stability parameters. Proteins for experimental evaluation are chosen via a proposed model to predict binding performance. Protein scaffold libraries consisting of millions of unique variants are expressed with diversified binding interfaces. Binding function is evaluated against several molecular targets to determine which proteins evolve specific binding variants. The observed binding performance is then used to adjust the predictive model. Iterative evaluation can be performed.

Results and Discussion

Computational Scaffold Analysis

We hypothesize that not all proteins possess the characteristics to robustly and efficiently evolve novel binding function upon mutation. To advance the understanding of scaffold properties that dictate evolvability, and to reduce the experimental burden of identifying new scaffolds or improving existing scaffolds, we aim to advance a computational/experimental framework to evaluate binding evolvability of candidates. We hypothesize that a combination of topological and biophysical parameters can be used to provide insight on performance. We focused the current study on small (<65 amino acids), single-domain proteins for multiple reasons. Small proteins provide improved physiological transport and rapid clearance of unbound molecules for enhanced selectivity.[27] Small, single-domain architecture eases fusion and site-specific conjugation for multifunctional constructs. The small size reduces exposed surface area that may lead to undesired nonspecific interactions. Moreover, small size heightens the challenge to simultaneously balance evolution of intramolecular stability and intermolecular binding,[28,29] which makes it a strong test case for evolution. Multiple types of protein structure can be used for diversification of a binding paratope including loops,[20,30] α-helices,[31,32] β-strands,[33] and mixed topologies.[21,22] Although the impact of entropic cost upon binding,[34,35] relative to more constrained paratope structures, remains difficult to accurately access, the conformational flexibility of loops suggests this secondary structure will be most accepting of mutagenesis.[36] Thus, we sought proteins with at least two enclosed loop regions each with at least four residues for diversification. The >100 000 proteins in the Protein Data Bank (PDB) were (i) filtered for size (30–65 AA pretrimming) and the presence of two loops with at least four residues. 787 unique protein scaffolds were (ii) demarcated into conserved frameworks and diversifiable paratopes and (iii) characterized by 20 parameters describing geometrical, chemical, and stability properties (summarized in Table and the following text and described in depth in Experimental Procedures). (1) Protein Connectivity. We hypothesized that the connectivity of residues would impact protein stability, leading to the calculation of inter-residue contact degree (total and long-range) and contact order.[37] (2) Paratope Connectivity. Paratope connectivity and flexibility, the latter via normal-mode analysis,[38] was also calculated as we believed spatially removed diversifications will be less destabilizing to the remainder of the protein. (3) Conserved Surface Area Chemical Nature. As for the conserved framework, the amount and chemical nature of exposed residues are likely to affect the ability of proteins to withstand destabilizing mutations. PyMOL[39] was used to model the protein surface and calculate the chemical nature of the solvent accessible surface area (SASA). (4) Paratope Size and Topology. Paratope orientation was parametrized by spatial and angular separation to capture the potential additivity of the two paratope loops. Paratope size and shape were described by measuring the properties of the 2D and 3D binding interface. (5) Computational Stability. It is proposed that scaffolds must be stable and have mutational stability to maintain structural integrity when obtaining binding function. The FoldX empirical force field was used to estimate mutational destabilization and overall stability.[40] The amount of buried nonpolar surface area was also estimated as a relationship with stability was recently observed for small proteins.[41] (6) General Scaffold Properties. We propose the amount of new SASA introduced by cleaving termini may introduce destabilizing exposed surfaces. Termini without secondary structure were removed from experimental and computational analysis except in the calculation of new SASA. We also included descriptions of the amount of common secondary structure and total residues. Small protein topologies exhibit a broad range of values for these 20 parameters (Figure R), which provides potential utility for scaffold differentiation. Seventeen candidate scaffolds (Figure A–Q), which provide a range of characteristics (Figure R), were chosen for experimental evaluation.
Table 1

Evaluated Descriptors of Protein Scaffolds

factordescriptionmean ± SD (n = 787)
protein connectivity
contact degreetotal number of residue contacts within 8 Å920 ± 270 AU
contact ordersum of contact sequence separation divided by size and contact degree0.38 ± 0.01 AU
long-range contact degreenumber of residue contacts with sequence separation >12 divided by size11.8 ± 3.1 AU
paratope connectivity
paratope contact degreetotal number of residue contacts within 8 Å between a paratope and conserved residue430 ± 140 AU
paratope contact ordersum of paratope contacts sequence separation divided by paratope size and contact degree1.2 ± 0.4 AU
paratope stiffnessaverage stiffness of the paratope in an anisotropic network model–0.28 ± 0.39 AU
conserved surface area chemical nature
charged SASAconserved solvent accessible surface area of D, E, K, R980 ± 430 Å2
hydrophobic SASAconserved solvent accessible surface area of A, F, G, I, L, M, P790 ± 340 Å2
polar SASAconserved solvent accessible surface area of C, H, N, Q, S, T, W, Y780 ± 360 Å2
paratope size and topology
paratope angle[paratope 1: entire scaffold: paratope 2] angle based upon centers of volume110 ± 30°
paratope SASAsolvent-exposed surface area of an alanine-scanned paratope region780 ± 360 Å2
paratope separationdistance between the center of volumes of the paratopes16 ± 6 Å
projected paratope areatwo-dimensional projected area of the paratope in the orientation of maximum area74 ± 25 AU
projected paratope perimeterperimeter of the projected area of the paratope in the orientation of maximum area1.2 ± 0.4 AU
computational stability
buried NPSAamount of buried nonpolar surface area upon folding2700 ± 900 Å2
FoldX DDGmean difference in stability from parental across 50 variants17 ± 12 kJ/mol
FoldX energymean energy of 50 NNK variants using FoldX’s forcefield35 ± 25 kJ/mol
general scaffold properties
new SASAamount of solvent exposed area created when removing unstructured termini320 ± 260 Å2
secondary structure percentpercent of residues in an α-helix or β-sheet51 ± 12%
sizetotal number of residues in the scaffold47 ± 7 AA
Figure 2

Protein scaffold candidates show varying binding performance. (A–Q) The 17 assayed protein scaffolds with conserved region colored gray and variable paratope colored red. (R) 787 protein scaffolds of 30–65 amino acids with two solvent-exposed loops were computationally analyzed for 20 topological and biophysical factors (Table ). The z-score distributions across all scaffolds are depicted by the box plots (box, 25–75th percentile; center bar, median; whiskers, 1.5 × interquartile range). The plotted values for each of the 17 assayed scaffolds indicate a diversity of proteins were assayed. (S) A pooled sample of 1 × 1010 variants across 17 scaffolds was enriched for binding variants in seven campaigns. MACS sorting was performed until seven binding populations were identified toward diverse molecular targets. Positive selection sorts (bold molecular target) were completed after two depletion sorts of the other listed targets. Binding functionality, quantified here as increased relative yield over control beads, was observed in all campaigns. (T) The relative binding performance for each scaffold against each molecular target as determined by the difference in scaffold abundance from the initial population to the binding populations. Scaffold abundance combines unique variants and variant binding strength using exponential dampening of sequence counts. Inset: The initial abundance of each scaffold. Error bars represent standard error (n = 3).

Protein scaffold candidates show varying binding performance. (A–Q) The 17 assayed protein scaffolds with conserved region colored gray and variable paratope colored red. (R) 787 protein scaffolds of 30–65 amino acids with two solvent-exposed loops were computationally analyzed for 20 topological and biophysical factors (Table ). The z-score distributions across all scaffolds are depicted by the box plots (box, 25–75th percentile; center bar, median; whiskers, 1.5 × interquartile range). The plotted values for each of the 17 assayed scaffolds indicate a diversity of proteins were assayed. (S) A pooled sample of 1 × 1010 variants across 17 scaffolds was enriched for binding variants in seven campaigns. MACS sorting was performed until seven binding populations were identified toward diverse molecular targets. Positive selection sorts (bold molecular target) were completed after two depletion sorts of the other listed targets. Binding functionality, quantified here as increased relative yield over control beads, was observed in all campaigns. (T) The relative binding performance for each scaffold against each molecular target as determined by the difference in scaffold abundance from the initial population to the binding populations. Scaffold abundance combines unique variants and variant binding strength using exponential dampening of sequence counts. Inset: The initial abundance of each scaffold. Error bars represent standard error (n = 3).

Scaffold Binding Evaluation

To evaluate scaffold evolvability, we performed de novo discovery of binding ligands from a merged combinatorial library of all 17 scaffolds. Combinatorial libraries were genetically synthesized in which the two paratope loops were diversified with 8–17 (mean 11.3) “NNK” degenerate codons, which enable all 20 natural amino acids. The gene libraries were transformed into a yeast surface display system to robustly produce scaffold variants, which yielded 3–9 × 108 variants per scaffold. The 17 scaffold libraries were mixed resulting in a total diversity of 1 × 1010 protein variants. Deep sequencing revealed that the synthesized library matched design with only 1.2% median deviation from NNK diversity and a 1.1% framework mutation rate. The pooled library was sorted to identify specific binding ligands to a panel of diverse proteins: luciferase, CTLA4, avidin, PD-1, green fluorescent protein, R-phycoerythrin, and vascular endothelial growth factor. Four to five rounds of magnetic activated cell sorting were used to deplete nonspecific binders and enrich selective binders. Maximum diversity of the sequenced population, estimated by the lowest-yielding sort with each cell containing a unique variant, ranged from 3500 to 715 000 per campaign. Enriched populations exhibited selective binding (Figure S) and were deep sequenced to characterize scaffold variants. 280 000 (range = 1250–115 000 per campaign) full-length reads were obtained yielding 21 000 (range = 160–9000 per campaign) unique binding variants. Individual campaign sorting and sequencing statistics are summarized in Table S1. With oversampled sorting, enrichment is correlated with binding affinity.[42] MACS sorts were performed with at least 10-fold diversity of yeast, allowing for differential recovery among clones of various binding strength. While our depth of sequencing did not fully sample the theoretical diversity, the differential frequencies of obtained variant reads suggests the obtained results reflect the differential affinities of the assayed scaffold variants. The overall binding performance of a scaffold was calculated as the mean difference in normalized abundance between the final and initial binding populations after transforming (quartic-root dampening[43]) sequence frequencies to combine the binding strength and the number of unique binding variants. It should be acknowledged that the binding performance metric in this study is dependent on the performances of the other tested scaffolds, and only provides a relative comparison between scaffolds. To define a threshold value of performance, a binding performance of −0.006 was determined to best classify experimental binding performance by the ability to develop a strong binding variant (Figure S1). The assayed protein scaffolds possessed a range of ability to evolve novel binding function upon paratope mutations (Figure T). Five scaffold libraries failed to contain binding variants in any campaign: scaffolds C, F, and I maintained a near-neutral score as the starting abundance was rare, whereas scaffolds G and Q performed comparatively worse as each sequence had more potential to find binding variants. Scaffolds D and L produced binders to a single target. Yet, the binding was not strong relative to other binders, which rendered the scaffolds’ overall performances as poor. Libraries of scaffolds A, B, E, H, J, K, M, N, O, and P contained binders to more than one target, with A, E, H, J, K, N, and O producing binders with sequences that occupied ≥1% of the reads for a campaign (Figure S2). Scaffolds J, H, O, and P increased abundance in at least one campaign but overall yielded a negative performance (i.e., depletion in frequency upon evolution). Four scaffolds (A, E, K, and N) yielded an increased abundance across the study (Figure ). Scaffolds A, E, and N had an increase in normalized abundance above 0.1 in two or more campaigns. Scaffold A, a binding subunit of the chaperone protein calreticulin with a relatively extended fold exposing both diversified loop regions, was found in all binding campaigns. Scaffold E, an RNA polymerase inhibitor, presents a pair of solvent-exposed loops on one end of a scaffold in which a single α-helix packs across from a β-sheet. This topology, recently identified via scaffold mining,[10] has been validated as a protein scaffold and serves as a positive control for this experiment. Scaffold N, an actin-binding protein presenting a pair of loops between three relatively small helices, obtained binding function in six campaigns with only 9 diversified sites. Scaffold K, an antifungal protein, dominated the fourth binding campaign and comprises three interacting β-sheets. These scaffolds offer diverse options for ligand evolution and provide, along with analysis of the other scaffolds, a means by which to evaluate the impact of topological and biophysical parameters on scaffold evolvability.
Figure 3

Successful protein scaffolds have diverse topologies. The identity, natural function, structure, and sequence of the top performing scaffolds are presented. The top proteins have various amounts and types of secondary structure. Diversified paratope residues are colored red in both the primary sequence and PyMOL rendering of the protein. Strikethroughs in the sequence represent residues present in the solved structure that were removed in our experimental analysis (as unstructured termini).

Successful protein scaffolds have diverse topologies. The identity, natural function, structure, and sequence of the top performing scaffolds are presented. The top proteins have various amounts and types of secondary structure. Diversified paratope residues are colored red in both the primary sequence and PyMOL rendering of the protein. Strikethroughs in the sequence represent residues present in the solved structure that were removed in our experimental analysis (as unstructured termini). We would like to acknowledge a few limitations in the analysis of scaffold performance using the employed methodology in the experiment. Scaffold libraries may under- or overperform their overall evolvability for multiple reasons. The diversified sites may not be optimal as evolution can be aided by conservation of loop sites[44] and diversification of sites with secondary structure adjacent to paratope.[32,44] Full amino acid diversity is not optimal for evolution at many sites.[32,44] Yet the library designs that optimally balance intramolecular stability and intermolecular binding potential are not evident a priori. Thus, for consistency of scaffold evaluation, this common diversification strategy was employed. Additionally, assessing binding functionality via multivalent MACS with multivalent yeast display only requires moderate affinity. As our ability to identify functional scaffolds increases, modifying the selection stringency may modify scaffold performance and associated predictive parameters. There are several potential sources of variability in the experiments. Illumina preparation could have PCR bias;[45] however, initial library sequencing identified all scaffolds and our evolvability metric accounts for differences in initial abundance, which mitigates this issue. Additional differences in initial abundance could be explained by differential library construction efficiency. Severe undersampling of the theoretical 1016 variants yields potential stochasticity; however, the depth and breadth of evolved binders (21 000 unique sequences) provides a generalizable result. Finally, it is observed that not all scaffolds perform equally for all targets. The use of seven campaigns addresses this concern, and future experiments may benefit from further increasing campaign breadth.

Identifying Evolvable Scaffold Properties

To evaluate a generalizable impact of topological and biophysical parameters on scaffold evolvability, a tandem independent component analysis (ICA) and elastic net regularization protocol was performed. Given the extensive resources required to evaluate numerous scaffold performances, we sought to predict performance from our limited data set while avoiding overfitting. Briefly, the 20 calculated factors for 787 potential scaffolds were z-transformed and subsequently whitening transformed by principal component analysis to determine orthogonal metavariables, which describe variability between scaffolds in lower dimensional space and remove correlation (Figure S3). Six scaffold features were then reconstructed using ICA to identify underlying independent features describing protein scaffolds (Figure S4). The six independent components for the 17 assayed scaffolds were then fed into an elastic net regularization to determine predictive descriptions of scaffold binding performance. Regularization penalizes the norm of term coefficients, removing terms which do not aid predictive power. The technique isolated two components which best reduced a leave-one-out (LOO) root mean squared error (RMSE) in predicting scaffold performance (Figures A and S4). The final model was composed of a constant term, to account for bias in the definition of scaffold performance, and two independent components. The most predictive model successfully identifies 4 of the 6 functional scaffolds above the determined threshold. Nine of the 11 scaffolds predicted to be less evolvable indeed fit that description. Yet the model does result in false positives for 2 of 6 scaffolds.
Figure 4

Large disconnected paratopes are associated with increased binding performance. ICA analysis was completed to describe the independent features of protein scaffolds. Elastic net regularization was performed to determine which of the features predicted binding performance. The resulting linear model was composed of two independent components and a constant term yielding a LOO RMSE of 0.06. (A) The LOO prediction of scaffold binding performance obtained a 4/6 true positive rate, a 9/11 negative predictive value, and a precision (positive predictive value) of 4/6. Classification threshold was determined by ability to evolve a strong binding variant. (B) The predictive model is a linear combination of the 20 calculated parameters and a constant term. The coefficients describe which parameters to modify to improve binding performance of a small protein scaffold.

Large disconnected paratopes are associated with increased binding performance. ICA analysis was completed to describe the independent features of protein scaffolds. Elastic net regularization was performed to determine which of the features predicted binding performance. The resulting linear model was composed of two independent components and a constant term yielding a LOO RMSE of 0.06. (A) The LOO prediction of scaffold binding performance obtained a 4/6 true positive rate, a 9/11 negative predictive value, and a precision (positive predictive value) of 4/6. Classification threshold was determined by ability to evolve a strong binding variant. (B) The predictive model is a linear combination of the 20 calculated parameters and a constant term. The coefficients describe which parameters to modify to improve binding performance of a small protein scaffold. By distributing the weights of the independent components in the model back onto the calculated biophysical parameters, we can hope to obtain a physical understanding of what predicts scaffold success. On the basis of the linear model term coefficients, the predicted model suggests generally decreasing scaffold connectivity, paratope connectivity, conserved exposed surface area, buried nonpolar surface area, FoldX energy, secondary structure, and size (Figure B). It also suggests increasing paratope 2D and 3D surface area, 2D perimeter, and exposing new surface area upon removal of unstructured termini. While an exact interpretation of the model is complex, a general trend appears to suggest a large, disconnected paratope may predict increased binding performance. The distribution of binding performance of all predicted scaffolds can be found in Figure S5. While several approaches to identify predictive biophysical parameters could have been utilized, we identified what we believe to be the most compelling approach using underlying features of protein scaffolds. For thoroughness, we also tested a similar approach utilizing principal components, which best describe differences between scaffolds, yielding a comparable outcome in terms of predictability and parameter insight (Figure S6). Both models agree on reducing protein and paratope contacts, minimizing conserved SASA, and increasing paratope SASA yet differ in the impact of paratope stiffness, FoldX energy, and new SASA. In a third approach, each individual parameter was analyzed to determine predictive performance. The top two predictive models in terms of minimizing LOO RMSE also suggest a decrease in conserved polar SASA or an increase in paratope SASA.

Paratope Analysis

We sought to analyze the characteristics of the evolved scaffold variants to illuminate any trends which may aid in future paratope design. We first asked if the binding variants for each scaffold were closely related in sequence space by plotting the distribution of pairwise Hamming distances for each scaffold. (Figure A). A paratope size normalized Hamming distance of 1 represents a completely unique paratope by position. A distance less than 1 represents variants with more similar paratope motifs. On the basis of the Hamming distance, only 2 of 12 binding scaffolds significantly reduced the sequence space from their initial distribution (P < 0.05, one-tailed Kolmogorov–Smirnov Test with Bonferroni correction for multiple comparisons). The similar Hamming distance distribution between the initial and binding populations provides evidence that the populations have roughly the same extent of diversity. The decreased distance for some scaffolds suggests that not all sequence space is functional in evolving novel binding function for some scaffolds but proves the results of our assay are not dominated by single binding motifs. Additionally, the mutational rate of the conserved residues of the binding proteins was 5% (relative to 1.1% in the naïve library), suggesting some mutations outside of the paratope may benefit binding evolution.
Figure 5

Binding variants describe functional amino acid space. (A) The diversity of sequenced variants based upon matched residues per position. NNK distribution was estimated via 5000 random NNK paratope-diversified sequences with a 1/1000 chance of framework mutations (Q30). The Hamming distance was then summarized by 20 bins based upon the number of mismatched residues per paratope size. Error bars represent standard deviation of Hamming distance frequencies across scaffolds (n = 17 for NNK and initial, n = 12 for binding). (B) The change in amino acid frequencies of binding variants relative to the initial library for all paratope sites across all scaffolds.

Binding variants describe functional amino acid space. (A) The diversity of sequenced variants based upon matched residues per position. NNK distribution was estimated via 5000 random NNK paratope-diversified sequences with a 1/1000 chance of framework mutations (Q30). The Hamming distance was then summarized by 20 bins based upon the number of mismatched residues per paratope size. Error bars represent standard deviation of Hamming distance frequencies across scaffolds (n = 17 for NNK and initial, n = 12 for binding). (B) The change in amino acid frequencies of binding variants relative to the initial library for all paratope sites across all scaffolds. We then analyzed the evolution of paratope composition to assess the impact of particular amino acids on the creation of binding function (Figure B). Tryptophan and tyrosine, increased by 12% and 3%, respectively, have been previously reported to interact specifically across many interfaces because of the ability to partake in different bonds including π-stacking, hydrogen-bonding, and cation−π interactions.[46−48] Arginine, which often serves as a hot-spot residue for key interactions but has also been previously associated with nonspecific interactions, increased by 3%.[46−48] Glycine increased abundance by 3% perhaps by adding flexibility to the loop regions.[49] Proline increased in abundance by 2%, perhaps by improving scaffold stability by reducing the conformational entropy of the unfolded state.[49] Interestingly, serine has previously shown to be upregulated in binding variants but was greatly reduced in this study.[46−48] The raw abundance for each residue in the various sequencing populations is depicted in Figure S7.

Developability Impacts Scaffold Performance

In addition to evolving novel binding function upon mutation, the developability of a protein scaffold is also important for utility as a molecular targeting agent. We define a developable protein to possess high producibility, stability, solubility, and other usability factors. While the preceding experimental evolution did not directly select for developability, we sought to provide an introductory analysis of developability metrics of the studied scaffolds. We produced protein scaffold variants recombinantly in Escherichia coli to determine if recombinant yield was predictive of scaffold performance (Figure ). Parental proteins, evolved binding variants, and random variants from the naïve library were expressed via pET plasmids in T7 Express E. coli. The identification of soluble protein was performed via PAGE gel analysis, FPLC purification, and anti-His tag ELISA. We found that modifying temperature and time of induction impacted protein yield for producible clones but did not recover any poorly produced proteins.
Figure 6

Limited protein producibility highlights the importance of scaffold developability. Each scaffold is classified by the ability to develop a strong binder (abundance > 1% in at least one campaign) and the parental protein producibility (ability to produce in T7 E. coli in detectable soluble yields). If applicable, the producibility of scaffold variants are shown as no. produced/no. attempted.

Limited protein producibility highlights the importance of scaffold developability. Each scaffold is classified by the ability to develop a strong binder (abundance > 1% in at least one campaign) and the parental protein producibility (ability to produce in T7 E. coli in detectable soluble yields). If applicable, the producibility of scaffold variants are shown as no. produced/no. attempted. On the basis of the detection of parental protein in the soluble fraction of T7 E. coli, scaffolds whose parental protein is effectively produced in the soluble fraction have a higher probability of evolving a strong binding variant (one-tailed two-sample proportion test, p = 0.057). Under the hypothesis that proteins expressed must be stable, have low aggregation propensity, and readily fold, this data suggests that well-behaved proteins will serve as a better starting point for scaffold discovery. Additionally, the data recommend that protein scaffolds should be derived from highly developable proteins, rather than engineering developable parameters postidentification of binding functionality. Interestingly, the ability of a parental clone to produce was not indicative of variant producibility (p = 0.3).

Proteolytic Stability

We then sought to characterize the stability of scaffold variants on the surface of yeast, where binding function was observable and more complex protein production machinery exists. Using proteinase K, flow cytometry, and deep sequencing, the relative proteolytic stability of 1300 unique scaffold variants were determined by analyzing the amount of protease required to cleave the distal epitope tag on a yeast surface displayed scaffold variant (Figure A). The method could be influenced by protein aggregation protecting variants from cleavage. Notably, the scaffold A parental variant was resistant to cleavage yet found in multimeric states on PAGE gels and mass spectrometry upon recombinant soluble expression. Nevertheless, this high-throughput analysis informs on stability as recently validated.[41]
Figure 7

Proteolytic stability assay identifies stability requirement for binding. (A) Protein scaffold variants were exposed to various levels of proteinase K and sorted based on degree of cleavage on the surface of yeast. The slope of the protease resistance (i.e., collection bin) versus protease concentration is correlated to protein stability. (B) The proteolytic stability of the parental scaffold is correlated to the binding performance of the scaffold. (Note: n.d. for Scaffold K.) (C) Violin plot comparing stabilities of naïve variants and binding variants. A Wilcoxon one-tailed signed rank test indicates that binding variants are less stable than naïve variants (p = 0.034).

Proteolytic stability assay identifies stability requirement for binding. (A) Protein scaffold variants were exposed to various levels of proteinase K and sorted based on degree of cleavage on the surface of yeast. The slope of the protease resistance (i.e., collection bin) versus protease concentration is correlated to protein stability. (B) The proteolytic stability of the parental scaffold is correlated to the binding performance of the scaffold. (Note: n.d. for Scaffold K.) (C) Violin plot comparing stabilities of naïve variants and binding variants. A Wilcoxon one-tailed signed rank test indicates that binding variants are less stable than naïve variants (p = 0.034). We first examined the stability of the parental variants for each scaffold and observed a positive correlation with the scaffold’s binding performance during MACS sorting (Spearman’s ρ = 0.56, p < 0.05; Figure B). The shape appears to suggest a threshold of stability is required to obtain high binding performance. We then tested the hypothesis that the stability of random diversified variants could correlate to parental protein stability. We measured the stability of an average of 60 variants per scaffold (range = 14–73; Figure S8). A large range of stabilities were observed among the naive variants without any evident correlation with parental stability (Spearman’s ρ = 0.43, p = 0.1). This outcome could be explained by the substantial diversification of the initial pool, which is likely to contain variants both close and far from the parental clone. A final comparison was performed between stabilities of naïve variants and binding variants for each scaffold. Interestingly, the protease stability of binding variants is significantly lower than that of nonbinding variants (one-tailed Wilcoxon signed-rank test on set medians, p = 0.034; Figure C). This suggests there is a trade-off between binding functionality and stability, as previously hypothesized.[50,51] Paired with the relationship between parental protease stability and scaffold binding function, we hypothesize that protein scaffolds with high protease stability will more efficiently evolve binding variants because they can “sacrifice” stability while remaining folded. This suggests that the search for future protein scaffolds should first involve a comprehensive study of protein stabilities and expression. This additional test may aid in the differentiation of proteins with otherwise similar biophysical properties when predicting evolvability as protein scaffolds.

Conclusion

The current study develops a computational-experimental platform to identify successful protein scaffolds and provides insight on the topological and biophysical parameters that dictate evolvability. However, the ability to develop specific binding function is not enough for a scaffold to be useful in downstream applications. The stability and producibility of the proteins also determine scaffold utility. Interestingly, these developability factors also correlate to binding evolvability of the protein scaffold. Future work in this field should combine the predictive biophysical model and the observed relationship between protein stability and scaffold functionality to narrow the assayed candidates. We also note that this method of computationally calculating biophysical parameters of proteins to relate to desired functionality is applicable beyond protein scaffold identification. A similar analysis could be completed to determine predictive performances of protein developability metrics, enzyme efficacy, and antimicrobial peptide activity. The current limitation in such studies is the collection of a sufficiently rich data set to build a robust computational model.

Experimental Procedures

Scaffold Parameter Calculation

Protein Data Bank files were obtained for files containing a protein chain ranging from 30 and 65 amino acids. Chains were then parsed for unique sequence and secondary structure as determined by the depositor. Paratope loop regions were assigned as continuous stretches of at least four amino acids without secondary structure. Terminal amino acids were removed if located at 3 or more residues from the outermost secondary structure. Homemade Python scripts were then used to calculate 20 parameters. Scripts are available online on GitHub: https://github.com/HackelLab-UMN.

Protein Connectivity

We hypothesize that a more connected protein is correlated to increased stability but decreased mutational stability. The distances between residue β-carbons (or α-carbon for glycine) are measured for all residues in the terminal-trimmed protein. Residues with Euclidian distances of ≤8 Å are considered contacts, consistent with ranges found in literature.[37] Three parameters are calculated: (1) contact degree, the total number of contacts;(2) contact order, the sum across all contacts of the difference in primary sequence index, normalized by contact degree and the total number of residues;and (3) long-range contact degree, the number of contacts with difference in primary sequence index greater than 12, normalized by the total number of residues.

Paratope Connectivity

We hypothesize that less connected and more flexible paratopes will be more accepting of diversification required to obtain binding function by limiting the destabilization of the entire protein. Contacts were calculated between paratope residues and conserved residues within 8 Å. Normal mode analysis[52,53] was used to estimate the flexibility of the paratope as determined by its connectivity to the remainder of the protein. Three parameters are calculated: (4) paratope contact degree, the number of contacts between a paratope residue and a conserved residue;(5) paratope contact order, the sum of paratope contacts’ difference in primary sequence index, normalized by paratope contact degree and the number of paratope residues;(6) paratope stiffness, the average of the z-score transformed mean mechanical stiffness spring constant of paratope residues’ α-carbon calculated by an anisotropic network model[38]—high stiffness suggests a less flexible and more connected residue.

Conserved Surface Area Chemical Nature

We hypothesize that the type of conserved exposed surface area will affect protein scaffold stability. The solvent accessible surface area (SASA), as determined by the radius of a water molecule in PyMOL, was summed for each residue based upon chemical nature. Chemical categorization led to three parameters: (7) charged (D, E, K, R) SASA, which may aid in protein stability by creating surface intramolecular salt bridges; (8) hydrophobic (A, F, G, I, L, M, P, V) SASA, which is likely destabilizing because of the entropic cost of solvation; (9) polar (C, H, N, Q, S, T, W, Y) SASA, which may contribute to stabilization in polar solvents.

Paratope Size and Topology

We hypothesize that two large and spatially close paratope regions will maximize the binding surface and increase the total energetics of binding toward the molecular target. Three parameters were based upon 3D structural data: (10) paratope angle, the [paratope 1: entire protein: paratope 2] angle based upon the atomic center of volume; (11) paratope SASA, calculated after mutating all paratope residues to alanine in PyMOL; (12) paratope separation, the distance between atomic center of volumes of the paratopes. A 2D projection, created by modifying PyMOL’s depth cue, fog, and lighting, was also used for two 2D parameters: (13) projected paratope area, the sum of the pixels containing the paratope residues’ projection and (14) projected paratope perimeter, the number of paratope pixels bordered by a non paratope pixel. To obtain the 2D projections, the protein was rotated to determine the projection with the maximum area of the paratope. The background and conserved residues are colored black with the epitope colored white. A ray-traced image is populated, and the pixel intensity is counted using Python’s Image Library. Both area and perimeter were normalized by the pixel area of a pseudoatom placed at the center of the paratope regions.

Computational Stability

We hypothesize that protein stability will impact mutational tolerance[50] and sought to computationally estimate stability based upon existing correlations. Three parameters were calculated: (15) buried nonpolar surface area (buried NPSA),[41] the sum of solvent exposed nonpolar amino acids in Gly-X-Gly[54] minus the sum of solvent exposed nonpolar amino acids in the folded protein; (16) FoldX DDG, the mean difference in force field energy between mutant and parental variants; and (17) FoldX Energy, the mean force field energy of predicted scaffold mutants. For FoldX calculations, 50 variants randomly selected from an NNK distribution were simulated by FoldX 4,[40] which is sufficient to obtain a 5.1% average coefficient of variation (n = 3 sets of 50 variants).

General Scaffold Properties

We hypothesize that additional factors, which are not explicitly included in categories above, may also impact scaffold performance. Three factors were included: (18) new SASA, the amount of new SASA of scaffold residues after unstructured tails are removed; (19) secondary structure percent, the percentage of scaffold residues categorized as part of an α-helix or a β-sheet; and (20) size, the number of residues in the scaffold after removal of nonsecondary structured termini.

Binder Discovery

We first sought to select proteins with small size, strong computed mutational stability, large and spatially proximal paratopes, minimal newly exposed SASA upon terminal trimming, and a small ratio of perimeter2 to area for the projected paratope. The weights assigned to each factor were randomly assigned and 24 scaffolds were selected for testing from the 619 initial candidates: 8 containing α-helices, 8 containing β-sheets, and 8 containing both secondary structures. Twenty-four scaffolds were chosen to balance breadth of parental proteins and experimentally achievable depth of scaffold variants. Seven of the 24 synthesized libraries had less than 3/10 clones match design and were removed from the study. Genetic combinatorial libraries were synthesized to encode for the 17 scaffolds with full amino acid diversity at the paratope sites encoded via NNK codons. Oligonucleotides for these libraries were purchased from LabGenius. Genes were amplified via PCR (200 μL, 1 μM primers, 200 μM dNTPs, 10 U Taq Polymerase, 1× ThermoPol Buffer, 0.5 μM template gene, 30 cycles) and concentrated via ethanol precipitation with PelletPaint (Millipore Sigma). Yeast display plasmid providing an N-terminal Aga2p, an HA epitope, a flexible (G4S)3 polypeptide linker, and a C-terminal AU5 epitope (pCT-AU5), was produced in NEB5α E. coli (New England Biolabs) and purified via silica spin column (Epoch Life Science) according to manufacturer’s protocol. The vector was linearized via restriction digest with NdeI, PstI-HF, and BamHI-HF (New England Biolabs). Digested vector was ethanol precipitated and resuspended in deionized water. For each scaffold, 6 μg digested vector and all ethanol concentrated genes were transformed into Saccharomyces cerevisiae yeast (EBY100) via homologous recombination. Transformation followed previously described protocols,[55] with the addition of 30% v/v PEG 8000 in step 39, which was found to increase transformation efficacy.[56] Transformed sequence diversity was estimated by dilution plating onto selective media assuming all transformants were unique. Anti-AU5 antibodies failed to isolate full length display constructs; thus, nonsense sequences were obtained during sequencing, but omitted from analysis. The 17 scaffold yeast libraries were grown and induced as previously described,[55] and 10× the transformed diversity of each sublibrary was mixed to create a pooled library. For each round of magnetic-activated cell sorting (MACS), induced yeast were rotated with magnetic beads for 2 h at 4 °C and placed on a magnet for 5 min to isolate binding variants. Each round of MACS consisted of depletion sorts on two negative targets followed by enrichment on positive target beads. For depletion sorts, nonbinding yeast were collected for the next sort and binding yeast were plated for quantification. For enrichment sorts, the bound yeast were collected and grown for subsequent rounds. Yeast binding to both positive and negative target beads were washed with 1 mL of PBSA (1× phosphate buffed saline with 1 g/L bovine serum albumin, once for the first two rounds and thrice for additional rounds), and resuspended in selective growth media. A diluted fraction was plated for quantification. Positive selectivity (more yeast binding to positive target beads relative to negative target beads) was found after four to five rounds of MACS based upon plated recovery. A variety of protein targets were used to represent the diversity of potential molecular targets of protein scaffolds. Biotinylated green florescent protein (GFP), and Gaussia princeps luciferase (luciferase) were purchased from Avidity. Biotinylated human PD-1 extracellular domain and human CTLA4 extracellular domain were purchased from G&P Biosciences. Biotinylated R-phycoerythrin (PE) was purchased from AssayPro. Biotinylated human VEGF121 was purchased from ACROBiosystems. Protein targets were either added to Dynabeads Biotin Binder (ThermoFisher) or Dynabeads M-270 Carboxylic Acid beads, as described below. For selections on carboxylic acid beads, counter-sorts included bare carboxylic acid beads, tris(hydroxymethyl)aminomethane (Tris)-quenched carboxylic acid beads, or Dynabeads Protein A (ThermoFisher). For selections on avidin-coated Biotin Binder beads, counter-sorts included bare avidin beads and biotinylated goat IgG (Rockland Immunochemical) on avidin beads. Campaigns 1–3 were completed with 16.5 pmol/bead biotinylated protein targets conjugated to avidin beads. Campaigns 4–7 were completed with 33 pmol/bead targets conjugated to avidin beads for the first and third round and to carboxylic acid beads for the second and fourth rounds (and fifth round for campaign 4). Campaigns 1, 5, 6, and 7 isolated binders toward luciferase, GFP, PE, and VEGF121, respectively. Campaigns 2, 3, and 4 isolated binders toward CTLA4/Avidin, PD 1/Avidin, and CTLA4/Tris. Though binding was observed toward two molecules, the specificity over a third negative target signifies an enriched population with binding functionality. For avidin-based sorts, 10 μL of beads were mixed with 5 or 10 μL of 3.3 μM target in 100 μL of PBSA; beads were rotated at room temperature for 1 h, isolated via magnet, aspirated, and washed with 1 mL of PBSA before cells were added to the tube. For carboxylic acid sorts, manufacturer’s two-step coating protocol (without NHS) was followed except for the following modification: 2 μL of beads were used for each target to match total beads to avidin sorts.

Evaluation of Binder Performance via Deep Sequencing

DNA encoding for scaffolds was isolated from yeast using Zymolyase (Zymo Research). Briefly, 1 × 108 cells are incubated in 200 μL of lysis solution (50 mM phosphate buffer, 1 M sorbitol, 10 mM β-mercaptoethanol, and 75 U/mL zymolyase longlife) for 30 min at 37 °C after which DNA is extracted via silica spin column. PCR addition of Illumina adapters was performed to sequence scaffold genes in the initial and binding pools using Illumina MiSeq. Sequences were filtered using PANDASeq[57] with a confidence threshold value of 0.9 for primer and assembled reads. Scaffold identification was completed via homemade MATLAB scripts available on GitHub. Briefly, sequencing reads were translated, and filtered for sequences matching 70% of the (G4S)3 linker and AU5 tag. The scaffold was identified by sequences of the same length and 70% match of conserved residues. Unique sequence counts were based upon translated sequences. Three independent sequencing runs of the initial unsorted pool were completed, with at least 10 000 scaffold variants identified in each sample. The distribution of paratope residues reasonably matched the intended NNK diversity (median absolute deviation = 1.2%, Figure S7). The conserved residues had a mutational rate of 1.1%. To determine the distribution of sequences analyzed, the Hamming distance was calculated between all observed sequences. Comparison to computationally simulated NNK sequences indicated diverse sequence sampling with 15 of 17 libraries not significantly more clustered in sequence space than designed (Figure , P > 0.05, one-tailed Kolmogorov–Smirnov test with Bonferroni correction for multiple comparisons). Binding populations were individually barcoded and sequenced, yielding 280 000 full length reads across the seven binding populations. The binding performance of each scaffold is a function of the number of unique binders and the strength of binders. However, utilizing the raw read counts leads to descriptions of binding pools dominated by the strongest binding variants. One such method of combining diversity and binding functionality is exponential dampening.[43] Therefore, the number of reads for each unique sequence was quartic root dampened (a subjective balance to reward clonal performance, while dampening dominant clones to provide information from diverse clones), and the abundance of a scaffold is the total fraction of dampened reads per molecular target.To account for differences in starting abundance, the final binding performance metric was calculated as the mean difference in abundance for the seven scaffolds. It should be noted the binding performance metric is dependent on the other scaffolds assayed, yet it still provides a relative performance between scaffolds. To estimate a threshold value of useful binding performance, scaffolds were classified by the ability to develop a high affinity binding variant with >1% campaign abundance (A, E, H, J, K, N, O). A receiver operating characteristic curve was used to determine a binding performance threshold of −0.006 (Figures S1 and S2).

Evolutionary Model

With more calculated parameters than experimental data points (i.e., scaffolds), we sought to reduce the scaffold parameter space and avoid overfitting of a predictive model. We believe that some calculated parameters may be correlated and hypothesized we could describe the scaffolds using a smaller dimensional space of underlying features. Reconstructive independent component analysis (ICA) attempts to identify features by separating the data set into mutually independent latent variables.[58] ICA requires a whitening transformation of data to remove correlation, which was achieved via principal component analysis (PCA). PCA can be used to reduce dimensionality by describing scaffolds with orthogonal metavariables, which removes low order correlations.[59] Broadly, ICA describes features of protein scaffolds, whereas PCA describes features that best differentiate protein scaffolds. The calculation of the parameters was finalized and calculated for 787 protein scaffold candidates via scripts available on GitHub. All parameters were calculated via a deterministic algorithm with a singular result per scaffold, except for FoldX calculations described above which were performed on random library variants. Principal components were then calculated via singular value decomposition using the pca function in MATLAB’s Statistics and Machine Learning Toolbox. The first six components, which individually explained at least 5% of the variance in scaffold parameters with a sum of 80% total explained variation, were retained to predict scaffold performance (Figure S3). Independent components were then obtained via a modification of ICA with a reconstructive cost using the rica function in MATLAB (Figure S4). We then sought to determine which of the independent components best predicted scaffold binding performance. Regularization is a technique used to remove parameters which are not predictive of a desired characteristic.[60] A penalty term included in the objective function, associated with the norm of term coefficients, prevents overfitting of data by driving the coefficients of noisy inputs to zero. The six independent components for the 17 experimentally tested scaffolds were used to predict the observed binding performance using the MATLAB regularization function lassoglm with leave-one-out estimation of deviance. Elastic net regularization was performed with various penalty calculations of the L1/L2 norm (α = 0.01, 0.1 0.25, 0.5, 0.75, 1) and maximum number of model terms allowed (DFmax = 1–6). The performance of the regularization output was tested via leave-one-out prediction of the assayed scaffolds. The model with the lowest root-mean-squared-error of binding performance prediction was identified. MATLAB scripts for ICA/PCA analysis and regularization can be found on GitHub. The ability of the predictive model to identify functional scaffolds was based upon the threshold determined by the ability to develop strong binding variants.

Protein Production

Genes encoding for observed and parental scaffold variants were obtained from Twist BioScience. Genes were ligated into pET production plasmids with a C-terminal His6 tag and transformed into T7 Express Competent E. coli (New England Biolabs) following manufacturer’s protocol. Cells were induced at 37 °C for 2 h with 0.5 mM isopropyl β-d-1-thiogalactopyranoside, pelleted, and frozen. The cells were then lysed in (4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid) (HEPES) lysis buffer (50 mM HEPES, 5 mM CHAPS, 25 mM imidazole, 2 mM MgCl2, 20 mM NaCl, 7 U/μL benzonase, 50 mg/mL lysozyme, EDTA-free protease inhibitor, and 5% v/v glycerol) and incubated at 37 °C for 30 min before centrifugation and isolation of the soluble fraction. Protein purification was performed using HisTrap HP columns on an ÄKTAprime plus (GE Healthcare) with wash buffer (20 mM HEPES, 500 mM NaCl, 20 mM imidazole, pH 7.4) and elution buffer (20 mM HEPES, 500 mM NaCl, 500 mM imidazole) flowed at 1 mL/min. To quantify protein via ELISA, 100 μL of soluble lysate fraction was incubated in a 96-well plate overnight at 4 °C, washed 4× with 0.05% v/v Tween 20 in PBS via squirt bottle and patted dry. Plates were incubated in 100 μL of 0.1 μg/mL Anti-6X His tag HRP antibody (ab1187, Abcam) in PBS for 1 h at room temperature, washed 4×, treated with 100 μL of 3,3′,5,5′-tetramethylbenzidine (TMB) for 15 min, followed by 100 μL of TMB Stop Solution (ThermoFisher). His-tagged protein abundance was measured via absorbance at 450 nM using a plate reader. Known purified biotinylated protein was spiked into lysate without His-tagged protein to quantify the limit of detection: 2 mg of protein per liter of bacterial culture. Identification of produced protein was obtained via PAGE gel with and without nickel column purification or an Anti-His6 ELISA performed compared to a non-His tagged control protein. NuPAGE Bis-Tris Gels were used to identify the addition of a protein at the expected molecular weight based upon protein standard following manufacture’s protocol.

Proteolytic Resistance

Genes encoding for observed and parental scaffolds were transformed into a yeast surface display construct with N-terminal HA and C-terminal V5 epitope tags (PCT-V5) as described above, except gene preparation was performed via 400 μL PCR using Phusion polymerase (New England Biolabs). One ×106 yeast induced to display protein were incubated in 50 μL of PBSA with 0, 4 × 10–6, or 22 × 10–6 U/μL proteinase K at 37 °C for 10 min, and immediately washed with cold PBSA. Epitope tags were labeled with chicken anti-HA antibody (ab9111, Abcam) and mouse anti-V5 antibody (ab27671, Abcam), followed by AlexaFluor488-conjugated goat antichicken IgY (H+L) (Thermo Fisher Scientific) and AlexaFluor647-conjugated goat antimouse IgG (H+L) (Thermo Fisher Scientific). Labeling was performed as follows: 1 × 106 cells were rotated for 30 min at room temperature in 50 μL of PBSA with 1 ng/μL primary antibodies, pelleted at 8000g for 1 min, aspirated, washed with 1 mL of PBSA, incubated for 20 min at 4 °C in 50 μL of PBSA with 1 ng/μL secondary antibody; pelleted, washed, and resuspended at 2 × 107 cells/mL in PBSA for florescence activated cell sorting (FACS). Cells were sorted into four gates (bins) based upon C-terminal: N-terminal epitope signal ratio, with a low ratio suggesting full cleavage of the protein. Collection bin 3 corresponds to intact protein, and collection bin 0 corresponds to fully cleaved protein. Scaffold plasmids were extracted with Zymolase and PCR amplified with extension to add Illumina adapters as described above. Two experimental replicates were sorted and separately sequenced using Illumina HiSeq and processed using USearch[61] by filtering for a maximum 5% error rate per read and matching to ordered proteins. The mean collection bin of each protein was calculated for all three protease concentrations. For fully displayed proteins without protease, a line was fit with a fixed intercept corresponding to the no-protease collection bin. A zero slope indicates no decrease in mean collection bin (epitope signal ratio) with increasing protease concentration and suggests protease stability. The normalized deviation (magnitude trial difference average/range) across trials is 0.11 (Figure S9).
  53 in total

Review 1.  Structural aspects of protein kinase control-role of conformational flexibility.

Authors:  Richard A Engh; Dirk Bossemeyer
Journal:  Pharmacol Ther       Date:  2002 Feb-Mar       Impact factor: 12.310

2.  Side-chain conformational entropy at protein-protein interfaces.

Authors:  Christian Cole; Jim Warwicker
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

Review 3.  Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes.

Authors:  Jianpeng Ma
Journal:  Structure       Date:  2005-03       Impact factor: 5.006

4.  Isolating and engineering human antibodies using yeast surface display.

Authors:  Ginger Chao; Wai L Lau; Benjamin J Hackel; Stephen L Sazinsky; Shaun M Lippow; K Dane Wittrup
Journal:  Nat Protoc       Date:  2006       Impact factor: 13.491

5.  Toward a molecular understanding of the anisotropic response of proteins to external forces: insights from elastic network models.

Authors:  Eran Eyal; Ivet Bahar
Journal:  Biophys J       Date:  2008-01-25       Impact factor: 4.033

Review 6.  Non-immunoglobulin scaffolds: a focus on their targets.

Authors:  Katja Škrlec; Borut Štrukelj; Aleš Berlec
Journal:  Trends Biotechnol       Date:  2015-04-27       Impact factor: 19.536

7.  Small antibody-like proteins with prescribed ligand specificities derived from the lipocalin fold.

Authors:  G Beste; F S Schmidt; T Stibora; A Skerra
Journal:  Proc Natl Acad Sci U S A       Date:  1999-03-02       Impact factor: 11.205

Review 8.  The importance of being tyrosine: lessons in molecular recognition from minimalist synthetic binding proteins.

Authors:  Shohei Koide; Sachdev S Sidhu
Journal:  ACS Chem Biol       Date:  2009-05-15       Impact factor: 5.100

9.  On the selection of a tracer for PET imaging of HER2-expressing tumors: direct comparison of a 124I-labeled affibody molecule and trastuzumab in a murine xenograft model.

Authors:  Anna Orlova; Helena Wållberg; Sharon Stone-Elander; Vladimir Tolmachev
Journal:  J Nucl Med       Date:  2009-02-17       Impact factor: 10.057

10.  Potent and specific inhibition of glycosidases by small artificial binding proteins (affitins).

Authors:  Agustín Correa; Sabino Pacheco; Ariel E Mechaly; Gonzalo Obal; Ghislaine Béhar; Barbara Mouratou; Pablo Oppezzo; Pedro M Alzari; Frédéric Pecorari
Journal:  PLoS One       Date:  2014-05-13       Impact factor: 3.240

View more
  3 in total

1.  High-throughput developability assays enable library-scale identification of producible protein scaffold variants.

Authors:  Alexander W Golinski; Katelynn M Mischler; Sidharth Laxminarayan; Nicole L Neurock; Matthew Fossing; Hannah Pichman; Stefano Martiniani; Benjamin J Hackel
Journal:  Proc Natl Acad Sci U S A       Date:  2021-06-08       Impact factor: 11.205

2.  Chemical Diversification of Simple Synthetic Antibodies.

Authors:  Mariha Islam; Haixing P Kehoe; Jacob B Lissoos; Manjie Huang; Christopher E Ghadban; Greg Berumen Sánchez; Hanan Z Lane; James A Van Deventer
Journal:  ACS Chem Biol       Date:  2021-01-22       Impact factor: 5.100

3.  Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling.

Authors:  Willow Coyote-Maestas; David Nedrud; Antonio Suma; Yungui He; Kenneth A Matreyek; Douglas M Fowler; Vincenzo Carnevale; Chad L Myers; Daniel Schmidt
Journal:  Nat Commun       Date:  2021-12-08       Impact factor: 14.919

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.