Literature DB >> 28882005

CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions.

Mark A Hallen^1,2, Bruce R Donald^1,3,4.

Abstract

MOTIVATION: When proteins mutate or bind to ligands, their backbones often move significantly, especially in loop regions. Computational protein design algorithms must model these motions in order to accurately optimize protein stability and binding affinity. However, methods for backbone conformational search in design have been much more limited than for sidechain conformational search. This is especially true for combinatorial protein design algorithms, which aim to search a large sequence space efficiently and thus cannot rely on temporal simulation of each candidate sequence.
RESULTS: We alleviate this difficulty with a new parameterization of backbone conformational space, which represents all degrees of freedom of a specified segment of protein chain that maintain valid bonding geometry (by maintaining the original bond lengths and angles and ω dihedrals). In order to search this space, we present an efficient algorithm, CATS, for computing atomic coordinates as a function of our new continuous backbone internal coordinates. CATS generalizes the iMinDEE and EPIC protein design algorithms, which model continuous flexibility in sidechain dihedrals, to model continuous, appropriately localized flexibility in the backbone dihedrals ϕ and ψ as well. We show using 81 test cases based on 29 different protein structures that CATS finds sequences and conformations that are significantly lower in energy than methods with less or no backbone flexibility do. In particular, we show that CATS can model the viability of an antibody mutation known experimentally to increase affinity, but that appears sterically infeasible when modeled with less or no backbone flexibility.
AVAILABILITY AND IMPLEMENTATION: Our code is available as free software at https://github.com/donaldlab/OSPREY_refactor . CONTACT: mhallen@ttic.edu or brd+ismb17@cs.duke.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Substances：
Proteins

Year: 2017 PMID： 28882005 PMCID： PMC5870559 DOI： 10.1093/bioinformatics/btx277

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Protein design algorithms (Donald, 2011; Lippow and Tidor, 2007; Regan, 1999) address the following problem: given a protein system and a set of possible localized changes in chemical composition, choose the combination of changes that will optimize a desired functional property. Typically the chemical changes are mutations in sequence or modification of a ligand, while the functional requirement is ligand binding affinity (Floudas ; Georgiev ; Karanicolas and Kuhlman, 2009; Lilien ), protein stability (Desmet ; Donald, 2011; Gainza ; Georgiev ; Kuhlman and Baker, 2000), or some combination thereof (Hallen and Donald, 2016; Lewis ). Solving this problem requires the ability to accurately model protein structure, as binding affinity is sensitive to small changes in the conformation of the protein and ligand. Two approaches are currently employed for protein structure modeling and coupling it to sequence optimization. First, molecular dynamics can be used to simulate the behavior of a candidate design over time (Rapaport, 2004). This approach has the advantage that it can explore all conformational degrees of freedom. However, these simulations are time consuming and must be run separately for each candidate, making them prohibitively expensive for large sequence spaces. For example, a molecular dynamics-based design considering all 20 amino-acid types for each of 10 residues will require 2010=10 trillion simulations, which is clearly intractable. Indeed, accurately computing the binding constant for a single sequence is relatively time-consuming, since the timesteps are on the order of femtoseconds while the timescale of ligand binding is many orders of magnitude greater. Other loop modeling methods, such as POOL (Tripathy ), that search extensively over the backbone conformational space of a protein loop also limit their search to a single sequence (Donald, 2011). This brings us to the second approach, consisting of combinatorial algorithms that search a much larger sequence space without considering each sequence separately—the time cost scales sublinearly in the number of candidate sequences. This is important because the number of sequences is exponential in the number of mutable residues. Several classes of methods fall under this approach, as reviewed extensively in Donald (2011) and Gainza . Methods based on the DEE/A* algorithm (Desmet ; Gainza ; Georgiev ; Gordon ; Hallen ; Leach and Lemon, 1998; Pierce ), on branch- (Jou ) and tree decompositions (Xu and Berger, 2006), and on algorithms from integer linear programming (Kingsford ; Roberts ) and weighted constraint satisfaction (Roberts ; Traoré , 2016) offer provable guarantees of accuracy, while methods based on simulated annealing (Das and Baker, 2008; Kuhlman and Baker, 2000; Wang ) and genetic algorithms (Desjarlais and Handel, 1995; Lewis ; Leaver-Fay ) do not. Although the technique we present in this work could be used with most of these methods in principle, we have implemented it in a framework based on the DEE/A* algorithm, which we will now explain further. Using a provable algorithm with our new model ensures that empirical observations of accuracy precisely reflect the accuracy of the model, rather than a convolution of modeling and algorithm accuracy. DEE/A* was first presented as a method to optimize protein stability while modeling only sidechain flexibility (Leach and Lemon, 1998). Protein sidechain flexibility is known empirically to consist almost entirely of flexibility in sidechain dihedral angles, which are restricted to certain regions of dihedral space. These regions, termed rotamers, have been characterized for each natural amino-acid type, (Lovell ) by clustering of sidechain dihedral values for many residues of each type across many different high-resolution crystal structures. DEE/A* provided an efficient way to assign an amino acid type and rotamer to each residue in a protein to minimize energy. Initially, DEE/A* assumed every residue would only be found at the ‘ideal’ dihedral values for its rotamer (the modal values for that rotamer in crystal structure data). Later work helped to relax this assumption. The minDEE algorithm (Georgiev ; Roberts ) enabled search over sequence and conformational space with each sidechain dihedral restricted to a continuous range (an ideal rotameric value ), instead of to an ideal rotameric value exactly. The energy minima over this larger, more realistic sidechain conformational space have been shown to be significantly lower (Gainza ). The iMinDEE (Gainza ) and EPIC (Hallen ) algorithms sped up minDEE substantially while using the same modeling assumptions, and other extensions added the capability to model sidechain conformational entropy (Chen ; Donald, 2011; Georgiev ; Lilien ; Roberts ) and backbone motions (Georgiev and Donald, 2007; Georgiev ; Hallen ), while still exploiting the speedups iMinDEE and EPIC offer. Previous combinatorial protein design algorithms have also incorporated backbone flexibility, albeit to a limited extent. The BD algorithm (Georgiev and Donald, 2007) can allow motions in all backbone dihedrals ( and ψ), but these motions are propagated down the entire backbone chain, which severely limits the extent to which the backbone in the region of interest (e.g. active-site loop) can move without unfolding the protein (generally to 1 Å). Modeling larger changes would require either handling dramatic backbone movement elsewhere in the protein or facing the ill-conditioned problem of making dihedral changes in subsequent residues cancel each other’s downstream effects. The new parameterization we present here makes the latter problem well-conditioned, by using an intrinsically local set of internal coordinates. Another previous model for backbone flexibility in protein design is the use of a restricted repertoire of motions that may move the backbone more, but do not search all biophysically feasible motions even locally. These can be ad hoc, discrete backbone changes specific to a particular protein system (e.g. from antibody loop libraries (Al-Lazikani )), transplantations of fragments of other proteins’ backbones (Jacobs ; Zhou and Grigoryan, 2015), or backbones generated by molecular dynamics simulations (Fung ). Alternately, the repertoire can contain motions like the backrub (Davis ) and shear (Hallen ) that have been observed repeatedly in crystallographic alternates. The backrub (Davis ) in particular has been used in both DEE/A*-based (Georgiev ; Hallen ) and simulated annealing-based (Smith and Kortemme, 2008) protein design algorithms. The DEEPer algorithm (Hallen ) performs a provably complete search over the space defined by a set of possible mutations and a predefined repertoire of backrubs, shears and/or local discrete backbone perturbations. Indeed, some restriction on backbone flexibility is acceptable in the protein design context, because we know from X-ray crystallography that backbone conformational changes due to mutations or ligand binding are usually fairly local (Al-Lazikani ; Wong ). We also know that backbone motions are mostly limited to changes in the two dihedral angles and ψ of each residue, and that these dihedrals are restricted to a small subset of their possible values (Lovell ). This subset is known as the Ramachandran-allowed region and is well-characterized for each amino acid type (Lovell ), analogously to how sidechains are generally restricted to rotamers. Thus, the set of feasible backbone conformational changes can be characterized in the space of and ψ changes in the flexible region by imposing both inequality (Ramachandran) constraints, and holonomic (i.e. equality) constraints that ensure the non-flexible regions of the backbone do not move. Without the latter, significant and ψ changes would unfold the protein, because the amount of atomic motion due to a backbone dihedral change increases for atoms that are further from the axis of the dihedral rotation. Nevertheless, previous combinatorial protein design algorithms restrict the backbone substantially more than these empirical limits on flexibility would require. In the present work, we use a new parameterization of backbone conformational space to obtain a much more systematic search over the continuous space of local conformational changes. Any differential motion in a specified region of the backbone that is accessible by changing the backbone dihedrals and ψ can be accessed via our parameterization (Fig. 1). Our parameterization is designed for use in continuous energy minimization with box constraints on all degrees of freedom (Gainza ; Hallen ). Thus, we need not explicitly include holonomic constraints when using our parameterization; our parameterization intrinsically does not move the regions of protein backbone that need to be kept fixed. This parameterization allows us to use polynomial approximations (Taylor series) to efficiently evaluate the continuous backbone movements around a reference backbone. We thus provide a fast method to compute all atomic coordinates as a function of our novel degrees of freedom, by calculating Coordinates of Atoms by Taylor Series (CATS). We have integrated CATS with the iMinDEE (Gainza ) and EPIC (Hallen ) protein design algorithms, which call such continuous minimization as a subroutine. CATS casts the modeling of localized, continuous backbone dihedral flexibility into a form that supports all operations required by iMinDEE and EPIC.

Fig. 1

Backbone degrees of freedom used by CATS. (A) A voxel used in CATS for a 7-residue loop in ponsin (PDB id 2O9S (Gehmlich )), projected into the 2-D space of two of our new continuous degrees of freedom, denoted by and . Voxel border, blue; central conformation, black. (B) Conformations in the voxel: black, central conformation; red and green, conformations shown as dots in A; purple, a conformation for which all 8 degrees of freedom are at the voxel edge. (C, D) The boundary of the 2-D-projected voxel shown in A, graphed in the space of atomic Cartesian x coordinates (in Å) for the N and Cα atoms of E856 (C) and in the space of that residue’s backbone dihedrals (in degrees, D). For this 7-residue loop, the voxel has 8 dimensions and thus forms an 8-dimensional hypersurface in the 14-dimensional backbone dihedral space. The distorted parallelogram in (C) would be exactly a parallelogram if the constraints were linear We have implemented CATS in the OSPREY (Gainza ; Georgiev , 2009; Ojewole ) open-source protein design package. OSPREY has yielded many designs that performed well experimentally—in vitro (Chen ; Frey ; Georgiev ; Gorczynski ; Roberts ; Rudicell ; Stevens ) and in vivo (Frey ; Gorczynski ; Roberts ; Rudicell ) as well as in non-human primates (Rudicell )—and contains a wide array of flexibility modeling options and provably accurate design algorithms (Gainza ; Georgiev ). These features will allow CATS to be used for many different types of designs. By presenting CATS, this paper makes the following contributions: A new, continuous parameterization of backbone conformational space that includes all degrees of freedom that respect the backbone’s natural geometric constraints. An efficient algorithm, CATS, for using this parameterization in protein design. An implementation of CATS in our laboratory’s open-source OSPREY protein-design software package (Chen ; Frey ; Georgiev , 2009; Gainza ), configured for use with any of the protein design algorithms in OSPREY (Georgiev ; Gainza ; Hallen , 2015, 2016; Hallen and Donald, 2016; Lilien ; Roberts and Donald, 2015), available for download upon publication as free software. Experimental results of computational design calculations that demonstrate CATS finds sequences and conformations that are significantly lower in energy than previous algorithms, across 81 test cases using 29 different crystal structures, including an antibody mutant that resisted modeling by previous algorithms. In the antibody study, CATS models a loop backbone motion that is sterically crucial to the binding activity of a mutant that improves both gp120 binding and HIV-1 neutralization.

2 Materials and methods

2.1 Protein design with continuous flexibility in closed loops

2.1.1 Framework

CATS builds on previous protein design algorithms that model continuous flexibility: iMinDEE (Gainza ) and its variants DEEPer (Hallen ) and EPIC (Hallen ). In this section, we will review some aspects of the mathematical framework underlying these algorithms, which will also serve as the foundation for CATS. We assume that the conformation of the protein is a function of the sequence and n internal coordinates . We then define the conformational space of our system as the union of voxels (Georgiev ; Gainza ; Hallen ). Each voxel v is defined by a protein sequence and the inequality constraints for , where and are voxel-specific constants defined per our modeling assumptions. If , coordinate x is said to have continuous flexibility in v. The conformation of each residue j will be a function of only that residue’s amino-acid type and a subset of the degree-of-freedom values where . Thus, we can construct a very large voxel space combinatorially. The conformation space of each residue j consists of a limited number (usually <100) of ‘residue-specific’ voxels that bound only the degrees of freedom in . Thus, the conformation space of the entire system consists of all possible combinations of residue-specific voxels, where v1 is a voxel specific to residue 1, v2 to residue 2, etc. and thus all degrees of freedom of the system are bounded in their finite intersection v. These residue-specific voxels are called residue conformations (RCs) (Hallen ). As discussed by Hallen , any continuous degrees of freedom can be used in this framework, as long as we can perform efficient and accurate energy minimizations of the form where is the energy as a function of the conformational degrees of freedom. We must be able to evaluate Eq. (2) for the entire system and for subsets of it. In the former case, the voxel v will bound all the system’s degrees of freedom and will be the energy of the entire system. In the latter case, v will only restrict degrees of freedom for a subset A of the residues: v will be of the form . Likewise, the energy will consist only of interactions among those residues, and thus only will depend on the degrees of freedom . Following Georgiev ), Gainza ) and Hallen , we assume local minimization to be sufficient to find the minimum within a voxel, and we perform this minimization with the cyclic coordinate descent algorithm implemented in OSPREY (Chen ; Frey ; Georgiev , 2009; Gainza ). We also assume the availability of an energy function that maps the coordinates of the m atoms in the system to an energy. We use the implementation of AMBER (Cornell ; Weiner and Kollman, 1981) with EEF1 (Lazaridis and Karplus, 1999) solvation in OSPREY for this for purposes of this work, but the iMinDEE framework supports a wide range of energy functions (Georgiev ; Hallen ), and adding CATS to this framework introduces no additional restrictions on the energy function. Having chosen E, we define , where maps internal coordinates to all-atom coordinates. As discussed by Hallen , the iMinDEE framework is actually agnostic to the geometric meaning of the degrees of freedom x, as long as (i) each voxel is defined by box constraints, of the form in Eq. (1), and (ii) we know how to compute the kinematic map . The reason iMinDEE and its previously described variants have limited or no backbone flexibility is that holonomic constraints on the backbone dihedrals and ψ which restrict backbone motion to a specified region of protein backbone—e.g. a flexible loop region—are not box constraints. Our contribution in this paper is a parameterization of backbone conformational space that is equivalent to varying and ψ subject to these holonomic constraints, but satisfies the conditions (i) and (ii) above.

2.1.2 Open and closed loops

For internal coordinates that are sidechain dihedrals, the kinematic map a is well known: the sidechains are just rotated to the correct angles. This is because there is no restriction on the termini of the sidechains. Likewise, defining the voxel in sidechain dihedral space is fairly straightforward: we assume as in Georgiev ), Gainza ) and Hallen that each voxel corresponds to the assignment of a sidechain rotamer (Janin ; Lovell ) to each residue, and each dihedral is allowed to vary by about the ideal dihedral for the rotamer, which is empirically derived from a database of high-resolution crystal structures (Lovell ). Using sidechain dihedrals as continuous degrees of freedom allows sidechain motions in all directions that keep the bond lengths and angles and backbone conformation fixed. However, as mentioned in Section 1, backbone conformational changes associated with mutations or binding are generally fairly local—and indeed, complex, non-local changes are likely outside the scope of what protein design algorithms can accurately predict. This effectively imposes holonomic equality constraints: we vary and ψ subject to the constraint that the (user-designated) flexible section of backbone matches the starting structure at both ends of the flexible section. Such equality constraints are incompatible with the iMinDEE framework (Gainza ; Hallen ). To resolve this incompatibility, we reparameterize the backbone conformational space. Moving our new backbone degrees of freedom will allow backbone motions in all directions that do not change the bond lengths, angles and ω dihedrals, while keeping the non-flexible parts of the backbone fixed. We will now describe the assumptions about peptide plane geometry underlying CATS (Section 2.2). We will then use these assumptions to define the new degrees of freedom x and explain how all-atom coordinates are computed from them (Section 2.3).

2.2 Peptide-plane geometry assumptions

The starting point for CATS is a set of assumptions about which backbone degrees of freedom are free to move and which are not. We will assume (iii) that peptide planes are rigid bodies, and (iv) that the N-Cα-C′ bond angle in each residue is fixed. We encode these assumptions as equality constraints in the form where denotes the nitrogen and alpha-carbon coordinates of the flexible residues, the elements of c are quantities constrained by our geometry assumptions (iii–iv), and the corresponding elements of are the values of those quantities in the starting crystal structure. There are four constrained quantities per residue, and each component of c is a multivariate quadratic function. A detailed description of these constraints and a justification of the assumptions are provided in Supplementary Material (SM) 1. The coordinates of all backbone atoms besides the nitrogens and alpha carbons can be computed from and the assumption that peptide planes are rigid bodies, as described in SM 1 as well. Once the backbone conformation is determined, the sidechains and alpha hydrogens are placed onto the backbone as in Hallen . These observations greatly simplify the calculation of from our backbone degrees of freedom x: we need only calculate , and then the other components of can be computed from .

2.3 New backbone parameterization

To define a voxel in backbone conformational space, we will choose a central conformation and allow backbone motions away from this conformation in all directions that maintain the peptide plane geometry (Fig. 1). For a flexible backbone segment of k contiguous residues with , this space of motions has dimensions: 2k for the and ψ dihedrals of each residue, and 6 constraints to ensure that the residue at the end of the segment is continuous with the non-flexible residues after it (since the position and orientation of a rigid body each have 3 degrees of freedom). In the computational experiments described in this work, the central conformation for each voxel will be the crystal structure conformation, since we know it to be favorable and expect that local backbone adjustments around it can be scored energetically more accurately than arbitrary backbone motions can. However, in principle other central conformations could be used, to cover as much of backbone conformational space as desired (albeit at increased computational cost, which could scale up to linearly in the number of voxels in backbone conformational space). Let y be the vector of nitrogen and alpha-carbon coordinates for the k flexible residues. Let be the value of y at the central conformation. Consider a vector function such that the first components of are the constrained quantities (see Section 2.2), and the remaining components are affine functions of y, which we will call . In other words, . The components z parameterize the ()-dimensional hypersurface of constraint-satisfying backbone conformations, and are chosen to be affine for simplicity. As long as is nonsingular, any direction of motion b of the nitrogen and alpha-carbon atoms that keeps the constrained quantities c constant corresponds to a direction of motion of the affine components. To put this more formally, Let denote the directional derivative in direction b. If is an affine function and c satisfies , then there exists an affine bijection between and . A proof of Theorem 1 is provided in SM 3. Thus we can use the affine components z as our continuous backbone degrees of freedom. We will choose the constant terms of the affine functions so that . We can choose the linear coefficients defining z somewhat arbitrarily as long as is nonsingular; we will choose the (constant) gradient of each component of z to have norm 1 and to be orthogonal to all other gradients of components of f (evaluated at in the case of the constrained components c, which have non-constant gradient). In other words, we let where M is a matrix whose rows are orthonormal, and also are orthogonal to the rows of the matrix . In this sense the components resemble ‘normal modes’ of backbone flexibility (Bahar and Rader, 2005) in the vicinity of the central conformation (though whether they are actual normal modes depends on the energy landscape; our definition of z is intended to be agnostic to the energy function). They are also analogous to the user-controllable degrees of freedom in computer graphics systems that allow image manipulation while maintaining satisfaction of a set of constraints (Gleicher, 1992; Ngo ; Ngo and Donald, 1999). Now, let denote the vector of backbone degrees of freedom. To evaluate , as is required by the iMinDEE framework, we must evaluate the inverse mapping of f at the correct constrained values: . We compute this inverse function efficiently in the form of a Taylor series, whose coefficients we can derive analytically because we can compute all derivatives of f. The Taylor series is valid within a certain neighborhood around the central conformation , and we verify its accuracy within that neighborhood by sampling. In the case where there are multiple possible values of a given values of , we are interested in the branch defined by the Taylor series. This way, a is a well-defined function mapping values of our new backbone degrees of freedom to constraint-satisfying atomic Cartesian coordinates (Fig. 1). A summary of the algorithm for computing a is given in SM 2 and details of the Taylor series computation are given in SM 5. Thus, we can use these as a set of continuous degrees of freedom to parameterize our backbone conformational space for use in the iMinDEE framework. Finally, we can impose bounds on to define a voxel, allowing motion away from the central conformation in any direction that satisfies the peptide-plane geometry constraints (Eq. 3).

3 Results

3.1 Energy differences and backbone shifts

80 test cases using 28 different crystal structures showed CATS can make a big difference in protein energetics (Fig. 2). Three types of test cases were used: (a) design cases searching a large sequence space, (b) conformational searches for the wild-type sequence and (c) single-voxel minimizations starting from the wild-type backbone and sidechain conformations. In each case, CATS was compared to rigid-backbone design and to DEEPer backbone flexibility (Hallen ). The iMinDEE (Gainza ) and EPIC (Hallen ) search algorithms were used throughout, which have guarantees of accuracy, thus ensuring that energy improvements between the different models of conformational space are actually due to changes in the backbone flexibility model and not to error in the search algorithm. The five to nine flexible residues in each test case were chosen to be a contiguous segment of protein backbone.

Fig. 2

Seventy-nine computational experiments comparing CATS, DEEPer and rigid-backbone design. (A) Average improvement in energy (kcal/mol) in CATS (red) and DEEPer (blue) calculations compared to rigid-backbone calculations. Averages with standard error bars shown for designs, wild-type (WT) conformational searches, and single-voxel minimizations starting from the wild-type conformation. (B) RMSD (Å) between crystal-structure backbones and optimal backbones computed by CATS (red) and DEEPer (blue) for the same test cases as (A). CATS is able to model larger backbone changes, and the greater RMSD for designs compared to minimizations indicates CATS is modeling the backbone shifts induced by mutations In 87% of designs, 86% of wild-type conformational searches, and 54% of minimizations, the minimum-energy conformation found using CATS was lower than the minimum rigid-backbone energy by at least the thermal energy at room temperature (0.592 kcal/mol, calculated as the universal gas constant times a room temperature of 298 K). This is a rough measure for functional significance (Hallen ). Indeed, in 73% of designs the gap between the CATS and DEEPer minima exceeded this thermal energy. The gap between DEEPer and rigid-backbone minima in designs exceeded thermal energy in 67% of designs, closely matching the result in Hallen . On average, designs had 3.5 kcal/mol better energies with CATS than without backbone flexibility (Fig. 2A). Moreover, designs with CATS often differed in optimal sequence from the corresponding rigid-backbone designs, with CATS favoring larger amino acids in all but one case. Some of these amino acids were dramatically larger: for example, tryptophan replaced methionine 31 in a redesign of high-potential iron-sulfur protein (PDB id 3A38). This reflects CATS’ ability to find space in a protein for larger amino acids that would be sterically infeasible with the original backbone conformation. Thus, CATS greatly improves the modeling of major sequence changes. Ironically, the design with the largest backbone motion identified by CATS was in an 8-residue loop in the Dachshund regulatory protein (PDB id 1L8R (Kim )), which had backbone RMSD 0.31 Å RMSD and improved the energy by 17.1 kcal/mol compared to the original backbone. As discussed in SM 5, voxel sizes were selected by starting with a 2-Å range (-1 to 1) for each CATS degree of freedom, and then scaling down this range (for all degrees of freedom at once) by a factor of 1.3 repeatedly until RMS constraint violations sank below 0.01 Å. Despite this strict threshold, a ∼1-Å range for each CATS degree of freedom was usually chosen (Supplementary Fig. S2). These voxels are thus centered at the original (crystal structure) backbone conformation, which by construction has a value of 0 for each CATS degree of freedom. Sidechain dihedrals were allowed 9 degrees of motion in either direction from ideal rotameric values, as described previously (Georgiev ; Gainza ; Hallen ). Conformational search over the space defined by these voxels was performed using the EPIC algorithm (Hallen ). Computation times for the CATS designs reported here ranged from less than a minute to eleven days, with a median of 17.6 hours; for wild-type conformational searches the median was 7.9 hours. Further details of all the test cases described in this section are provided in SM 6.

3.2 Modeling of Trp 54 mutation in VRC07

The homologous antibodies NIH45-46 and VRC07 both bind with high potency to the HIV surface glycoprotein gp120, and neutralize a broad range of strains of the virus. However, HIV is notorious for mutating to resist the immune system, and thus modified antibodies with increased potency and breadth are of great biomedical interest—both for passive immunization and as a guide for vaccine development. A mutation from glycine to tryptophan at position 54 of NIH45-46 was found to increase breadth and potency significantly (Diskin ). In a previous study, one of us (BRD) and colleagues showed that this mutation increases the breadth and potency of VRC07 as well (Rudicell ). Since then, the question of whether this mutation can be modeled in computational design has been an open problem of considerable interest. Large changes in sizes of sidechains, as in this mutation, are more likely to induce backbone motions and thus more difficult to model computationally. Indeed, modeling this mutation has presented a challenge for previous protein design algorithms. A rigid-backbone conformation search (starting from a VRC07-gp120 complex structure with leucine at position 54 and PDB id 4OLX (Rudicell )) shows extensive clashes with two nearby backbone segments (Fig. 3A). Backrub perturbations (Davis ) to the backbone, which are often used to model previously unobserved backbone changes in extended conformations such at this loop (Georgiev ; Hallen ; Smith and Kortemme, 2008), could not resolve these clashes (Fig. 3B). The provably complete DEEPer algorithm was used to search the space of backrubs, ensuring that a feasible conformation was not missed in the search. Backbone conformational changes can also be modeled using loops transplanted from other structures, and indeed antibody loops have been classified into a list of canonical structures (Al-Lazikani ). But the crucial backbone motion here is far more subtle than the shifts between canonical structures, and thus is best handled with a continuous approach. Although molecular dynamics techniques can search over all degrees of freedom in a protein, they are unsuitable for large design spaces because a separate simulation would be needed for each sequence. Thus, modeling this sort of backbone motion in the combinatorial protein design context requires a technique for continuous and systematic search of backbone conformations that is compatible with combinatorial protein design algorithms.

Fig. 3

The CATS conformational space for a mutant of the antibody VRC07 includes non-clashing conformations inaccessible to rigid-backbone design. The backbone was either held rigid (A) or allowed DEEPer (Hallen ) (B) or CATS (C) flexibility for five residues. (A–C) Steric clashes between atoms indicated in pink. (D) The three designs overlaid (rigid backbone in magenta, DEEPer in cyan, CATS in green). (E) Broader view: 15 residues (green, yellow, pink) were allowed continuous sidechain flexibility, of which ten were restrained in an -continuous rotamer voxel centered on the original rotamer (n = number of sidechain dihedrals); the segment with backbone flexibility is shown in yellow, and Trp 54 in pink. Designs were run starting from PDB id 4OLX (Rudicell ) Indeed, CATS resolves this problem, as its conformational space includes a conformation with favorable contacts all around the mutation. Allowing one of the backbone segments that clashes heavily with Trp 54 in rigid-backbone search to relax by CATS resolves the clashes (Fig. 3C), causing a 16 kcal/mol improvement in energy relative to the rigid-backbone search (this is a 9 kcal/mol improvement relative to the DEEPer search). This improvement results from a fairly modest backbone shift: 0.28 Å backbone RMSD for the flexible segment, with per-residue backbone RMSDs up to 0.46 Å (Fig. 3D). The backbone motion modeled by CATS reduces the backbone RMSD of the modeled structure compared to a crystal structure with Trp 54 (PDB id 4OLZ (Rudicell )), from 0.61 Å to 0.46 Å, calculated using the method of Kromann and Bratholm (2013) and Kabsch (1976) for Trp 54 and the two gp120 residues it clashes with in the rigid-backbone model (Trp 427 and Gly 473). However, the RMSD change is somewhat difficult to interpret because independent crystal structures of the same protein would also be likely to exhibit RMSDs around this level. These results show the key role that local backbone flexibility, as modeled by CATS, can play in identifying favorable conformations and sequences. They also show that CATS can perform designs that could not be modeled using previous algorithms. In particular, they show that the level of backbone flexibility modeled by CATS is functionally significant, resulting in a qualitatively different conformational space. In particular, CATS reveals how a mutant that rigid-backbone computations dismiss as sterically infeasible can actually bind its target well.

4 Conclusions

CATS is a novel and systematic method to search substantial, continuous regions of backbone conformational space during protein design calculations. By moving away from fixed repertoires of motions and into comprehensive search of conformations with valid bonding geometry, it moves closer to fully realistic modeling of backbone conformational changes. A key challenge as we move into these larger spaces is ensuring that the energetic cost of the backbone conformational changes is estimated accurately enough by the energy function to yield useful results. But CATS can play an important role in addressing this challenge as well. CATS enables provably accurate algorithms, which introduce no new error beyond the error in the model, in contrast to stochastic, heuristic approaches that have been shown to drastically undersample the conformational space specified by the model (Gainza ; Simoncini ). As a result, CATS can be used to validate energy functions in the highly backbone-flexible designs it enables, with the guarantee that error in design predictions is due only to error in the energetic and geometric modeling and not to error in the algorithm (aside from CATS’ negligible and well-controlled Taylor series error). In addition, because CATS is agnostic to the energy function, it will be useful for performing conformational searches with the more accurate energy functions of the future (Hallen , 2016). CATS is also easily generalizable to non-protein systems—whether other macromolecules or small molecules. It is applicable in any context where local conformational perturbations are needed subject to bonding geometry constraints. One need only construct the appropriate multivariate quadratic to reflect these constraints. We believe these capabilities will make CATS useful in many kinds of designs. Click here for additional data file.

56 in total

1. The penultimate rotamer library.

Authors: S C Lovell; J M Word; J S Richardson; D C Richardson
Journal: Proteins Date: 2000-08-15

2. Protein design is NP-hard.

Authors: Niles A Pierce; Erik Winfree
Journal: Protein Eng Date: 2002-10

3. Predicting resistance mutations using protein design algorithms.

Authors: Kathleen M Frey; Ivelin Georgiev; Bruce R Donald; Amy C Anderson
Journal: Proc Natl Acad Sci U S A Date: 2010-07-19 Impact factor: 11.205

4. Toward full-sequence de novo protein design with flexible templates for human beta-defensin-2.

Authors: Ho Ki Fung; Christodoulos A Floudas; Martin S Taylor; Li Zhang; Dimitrios Morikis
Journal: Biophys J Date: 2007-09-07 Impact factor: 4.033

Review 5. Progress in computational protein design.

Authors: Shaun M Lippow; Bruce Tidor
Journal: Curr Opin Biotechnol Date: 2007-07-20 Impact factor: 9.740

Review 6. Macromolecular modeling with rosetta.

Authors: Rhiju Das; David Baker
Journal: Annu Rev Biochem Date: 2008 Impact factor: 23.643

7. Conformation of amino acid side-chains in proteins.

Authors: J Janin; S Wodak
Journal: J Mol Biol Date: 1978-11-05 Impact factor: 5.469

8. Hot-spot mutants of p53 core domain evince characteristic local structural changes.

Authors: K B Wong; B S DeDecker; S M Freund; M R Proctor; M Bycroft; A R Fersht
Journal: Proc Natl Acad Sci U S A Date: 1999-07-20 Impact factor: 11.205

9. Increasing the potency and breadth of an HIV antibody by using structure-based rational design.

Authors: Ron Diskin; Johannes F Scheid; Paola M Marcovecchio; Anthony P West; Florian Klein; Han Gao; Priyanthi N P Gnanapragasam; Alexander Abadir; Michael S Seaman; Michel C Nussenzweig; Pamela J Bjorkman
Journal: Science Date: 2011-10-27 Impact factor: 47.728

10. Paxillin and ponsin interact in nascent costameres of muscle cells.

Authors: Katja Gehmlich; Nikos Pinotsis; Katrin Hayess; Peter F M van der Ven; Hendrik Milting; Aly El Banayosy; Reiner Körfer; Matthias Wilmanns; Elisabeth Ehler; Dieter O Fürst
Journal: J Mol Biol Date: 2007-03-24 Impact factor: 5.469

6 in total

1. Computational Analysis of Energy Landscapes Reveals Dynamic Features That Contribute to Binding of Inhibitors to CFTR-Associated Ligand.

Authors: Graham T Holt; Jonathan D Jou; Nicholas P Gill; Anna U Lowegard; Jeffrey W Martin; Dean R Madden; Bruce R Donald
Journal: J Phys Chem B Date: 2019-11-27 Impact factor: 2.991

2. Minimization-Aware Recursive K:* A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape.

Authors: Jonathan D Jou; Graham T Holt; Anna U Lowegard; Bruce R Donald
Journal: J Comput Biol Date: 2019-12-06 Impact factor: 1.479

3. Protein Design by Provable Algorithms.

Authors: Mark A Hallen; Bruce R Donald
Journal: Commun ACM Date: 2019-10 Impact factor: 4.654

4. Design of peptides with high affinity binding to a monoclonal antibody as a basis for immunotherapy.

Authors: Surendra S Negi; Randall M Goldblum; Werner Braun; Terumi Midoro-Horiuti
Journal: Peptides Date: 2021-08-16 Impact factor: 3.750

5. OSPREY 3.0: Open-source protein redesign for you, with powerful new features.

Authors: Mark A Hallen; Jeffrey W Martin; Adegoke Ojewole; Jonathan D Jou; Anna U Lowegard; Marcel S Frenkel; Pablo Gainza; Hunter M Nisonoff; Aditya Mukund; Siyu Wang; Graham T Holt; David Zhou; Elizabeth Dowd; Bruce R Donald
Journal: J Comput Chem Date: 2018-10-14 Impact factor: 3.376

Review 6. Dynamics, a Powerful Component of Current and Future in Silico Approaches for Protein Design and Engineering.

Authors: Bartłomiej Surpeta; Carlos Eduardo Sequeiros-Borja; Jan Brezovsky
Journal: Int J Mol Sci Date: 2020-04-14 Impact factor: 5.923

6 in total