Literature DB >> 25580576

Control of repeat-protein curvature by computational protein design.

Keunwan Park¹, Betty W Shen², Fabio Parmeggiani¹, Po-Ssu Huang¹, Barry L Stoddard², David Baker³.

Abstract

Shape complementarity is an important component of molecular recognition, and the ability to precisely adjust the shape of a binding scaffold to match a target of interest would greatly facilitate the creation of high-affinity protein reagents and therapeutics. Here we describe a general approach to control the shape of the binding surface on repeat-protein scaffolds and apply it to leucine-rich-repeat proteins. First, self-compatible building-block modules are designed that, when polymerized, generate surfaces with unique but constant curvatures. Second, a set of junction modules that connect the different building blocks are designed. Finally, new proteins with custom-designed shapes are generated by appropriately combining building-block and junction modules. Crystal structures of the designs illustrate the power of the approach in controlling repeat-protein curvature.

Entities: Chemical

Mesh：

Substances：

Year: 2015 PMID： 25580576 PMCID： PMC4318719 DOI： 10.1038/nsmb.2938

Source DB: PubMed Journal: Nat Struct Mol Biol ISSN： 1545-9985 Impact factor: 15.369

Repeat protein scaffolds have attracted much attention as alternative binding scaffolds to antibodies[1-4] and also as building blocks of protein nanomaterials[5-7] because of their intrinsic modularity and high stability. The leucine rich repeat (LRR) is a repeat protein scaffold with a horseshoe-like global structure in which the concave surface is often a binding interface[8]. While LRRs share a common structural motif (LxxLxLxxN/C), different LRR modules generate proteins with distinct global curvatures when the repeat modules are packed on themselves[9]. Irregular LRR modules are frequently observed interspersed within arrays of canonical repeat modules; their presence contributes to the curvature diversity within the family. For example, Toll-like receptor 4 (TLR4) contains three distinct regions of LRR repeats, each having different curvatures which collectively generate a surface with high shape complementarity to the target surface of the MD2 protein[10]. Current engineering approaches have focused on changing residues at the binding surfaces of an already existing or consensus repeat protein[11-16], varying the numbers of repeat modules[17-19], and fusing naturally occurring repeat proteins[10,20,21]. While powerful, these strategies do not allow customization of repeat protein curvature for a specific application. Here, we describe a general computational design strategy to create new repeat proteins with custom-specified curvature. We demonstrate the power of the approach by designing twelve novel proteins with different curvatures. Crystal structures show that the method allows control of repeat protein curvature with atomic-level accuracy.

Results

Strategy for curvature-tunable scaffold design

Our design strategy has three steps (). The first step is the design of a set of idealized self-compatible building block modules (BB1, BB2, …., BBn) from which a series of proteins of variable length BBin can be created directly by varying the number of building block repeats without any further engineering. These “homo-building block” proteins will have a constant curvature defined by the base building block module. The second step is the design of a set of junction modules (JNBBi→BBj) that connect building block module i to building block module j. A critical feature of the design at step one and two is that the interfaces between individual building blocks, as well as those between building blocks in junction modules, have sufficiently low energy that the orientation between all units depends only upon the identity of adjacent repeats and is independent of the longer-range context. This enables the third and final step -- general module assembly -- the combination of building blocks and junction modules to generate a protein with a desired overall curvature. While the overall strategy is applicable to any repeat protein, in this paper we focus on LRRs. We describe the computational design and experimental characterization for each step in the following sections.

Step 1: Building block module selection and design

Nature provides a diverse set of LRR modules, with lengths from 20 to 30 amino acids[8], but only a few possess high self-compatibility such that repeated stacking of the same module generates a well-folded protein structure. We generated a Markov transition model for naturally occurring LRR proteins to investigate the overall patterns of module organization in LRR structures. In the model, nodes correspond to individual modules (represented by the module length: L22 indicates an LRR module with 22 residues, etc) and edges to transitions between modules with strength proportional to the transition frequency observed between the modules in the Protein Data Bank (PDB) (see Methods section). The resulting transition network () has strong self-edges corresponding to packing of identical modules for L22 and L24, and strong mutual transitions between L28 and L29. Accordingly, we selected these LRR types to design the idealized building blocks (). A recently developed Rosetta repeat protein idealization method[22] was used to design ideal versions of each unit. Different instances of the naturally occurring repeat units have somewhat variable sequences; the idealization process generates a single low energy repeat unit (both sequence and structure) guided by the available information for the family. Briefly, an idealized poly-valine backbone structure with identical repeats was generated using RosettaRemodel[23] with LRR family-specific constraints. Rosetta sequence design guided by a family-specific sequence profile was then carried out, while constraining the sequences to be identical for each repeat. The idealization of the L24 module (DLRR_B) is described in Parmeggiani et al.[22] We applied the idealization procedure to the L22 module (DLRR_A) and the two-unit {L28→L29} module (DLRR_C), and obtained the sequences and models in . Genes were synthesized for proteins containing 5 to 7 idealized building block modules. The N–terminal capping domain of internalin B was fused to DLRR_A and DLRR_B to enhance protein solubility and expression[12,20] whereas DLRR_C was expressed without a capping motif; instead the sequences of the N and C terminal repeats were redesigned to eliminate exposed hydrophobic residues. The idealized repeat designs were expressed in E.coli and found to be soluble and to have high thermal stability (). We solved the crystal structures of DLRR_A (L226) and DLRR_B (L247) () and found that they closely match the design models (DLRR_A at Cα root mean square deviation (RMSD) 1.4 Å; DLRR_B at Cα RMSD 1.7 Å, ). The crystal structures contain water-mediated networks localized to the convex side of the repeats; it may be possible to incorporate these in future design calculations (). Each of the idealized building block repeats has the expected overall curvature: repeats of the L22 and L24 building blocks generate solenoid-like structures, whereas repeats of the {L28→L29} building block are almost circular and have a more curved concave surface. Parametric descriptions of the global shapes generated by each building block repeat are provided in and .

Step2: Design of junction modules

We devised a computational protocol for junction module design which takes advantage of the conserved motif (LxxLxLxxN/C) in the idealized LRR building blocks: the core residues are kept constant to maintain a stable hydrophobic core, while the evolutionarily variable positions, primarily located on the convex side, are optimized to create a low energy interface between adjacent modules. To generate a junction module JNBBi→BBj connecting building block i and building block j, we start from a two-unit BBi2 module and a one-unit BBj module (). The second unit in BBi2 is superimposed on BBj by aligning the core motif residues. RosettaCM[24] is then used to generate a hybrid structure BBi→BBj with coordinates based on those of the first unit in BBi2 before the core motif and those of BBj after the motif. The residues at the fusion interface are optimized using RosettaDesign[25]. This redesigned hybrid two-unit structure BBi→BBj is the junction module JNBBi→BBj between building block i and building block j (). A special case of a junction module is a three-unit module JNBBi→BBw→BBi that connects two identical copies of the same building block but has a structure different than that of the building block (). We call such junction modules between two identical building blocks ‘wedge’ modules. Like other junction modules, wedge modules produce a local change in the protein curvature. We designed and characterized five junction modules connecting the building block modules described in the previous section. A junction module for L22→L24 was generated previously without hydrophobic core design[12], and hence direct fusion constructs between L22 and L24 were made in both directions (i.e. L22→L24 and L24→L22) to test compatibility between the two idealized modules. The hybrid model structures showed high structural compatibility without further design. Hence the junction modules in these cases are simply the fusion of the two building blocks. Two fusion proteins for L22→L24 (DLRR_D) and L24→L22 (DLRR_E) were expressed in E. coli, and found to be soluble and monomeric in Size-Exclusion Chromatography coupled to Multiple Angle Light Scattering (SEC–MALS) experiments (). Far-UV Circular Dichroism (CD) spectra and thermal denaturation profiles suggested well-packed structures with the expected secondary structure content (). The fusion proteins had similar or higher stability than the original L22 (DLRR_A) or L24 (DLRR_B) designs (); L22 and L24 evidently have high compatibility despite the rare occurrence of fusions between them in nature (). The crystal structure for L24→L22 (DLRR_E) was determined at 1.9 Å resolution and showed high consistency with the design model () and the original L22 or L24 structures (). Designs of junction modules for L22→L28 and L24→L28 are challenging because of substantial differences in their structures, including their module length (22 or 24 vs. 28), their secondary structure in the variable region (310-helix or loop vs. α-helix), their curvature on the concave surface (moderately-curved vs. highly-curved) and their global shape (super-helical vs. circular). The initial fusion models generated by RosettaCM[24] (before redesign) contained side chain clashes and cavities at the interface between the modules (). Residues at the fusion interface were therefore redesigned to improve the all-atom Rosetta energy and packing as assessed by RosettaHoles[26]. The junction designs were based solely on building block models generated by Rosetta[25] because the crystal structures of the building blocks were not determined. Six designs for L22→L28 (DLRR_F) and six designs for L24→L28 (DLRR_G) were experimentally characterized (). All designs were expressed in E. coli and found to be highly soluble and monomeric in SEC–MALS experiments (). They displayed well-defined far-UV CD spectra with minimum near 218 nm, similar to those of previously characterized LRRs with primarily beta sheet secondary structure. Thermal denaturation experiments showed cooperative unfolding for all fusion designs (), suggesting a well-packed hydrophobic core. Fusion of more stable LRR modules to less stable LRR modules via a well-designed junction appears to increase overall stability: the stability of all the junction module containing designs was greater than that of the original {L28→L29}[5] design (DLRR_C). We determined the crystal structure of the L24→L28 fusion (DLRR_G3) to evaluate the accuracy of the design. The crystal structure, determined at 2.5 Å resolution, shows the atomic details of the junction module as well as the structures of L24 and {L28→L29} modules (). The assumption underlying our approach that curvature can be locally controlled is supported by the similarity of the L24 modules (Cα RMSD 0.3 Å) in DLRR_G3 structure to those in the all L24 DLRR_B structure, and by the similarity of the {L28→L29} modules (Cα RMSD 1.3 Å) to the {L28→L29} modules in DLRR_C model. The key core side chain interactions in the junction module are very similar in the design model and crystal structure (Cα RMSD 0.9 Å, ). In addition to the junction modules linking the different building block modules, we designed a wedge module inserted between L24 modules. In native LRR proteins, inserting an ‘irregular’ module between the regular modules is a common way to generate structural diversity by altering the overall curvature or forming a binding interface other than the concave surface (for example, the diverse LRR module organization and irregular binding surfaces in TLR family[27] and plant LRR proteins[28]). The idealized L24 repeat structure (DLRR_B) was chosen as a base scaffold because it had the highest stability among the three idealized LRRs. For the wedge module design, L24→[any-length-of-LRR]→L24 triples were retrieved from the LRRML database[29] to identify irregular modules flanked by the L24 modules. A total of 21 unique irregular modules were identified. We selected the 32-length LRR unit (L32) found in the Toll-like receptor 3 structure[30] (PDB ID: 2A0Z, 532–563) as a starting point. L32 has a relatively rigid and structured loop located on its convex surface which could be useful in future binding pocket designs. The junction module design process was applied to the two fusion interfaces (L24→L32 and L32→L24) which resulted in the wedge module JNL24→L32→L24 (DLRR_H). Four designs for L24→L32→L24 were selected and experimentally characterized (). All designs were expressed in E.coli and found to be soluble. Two designs were monomeric in SEC-MALS experiment. Thermal denaturation experiments showed that insertion of the wedge module generally decreased stability of the base scaffold, but unfolding was still cooperative (; ). The crystal structure of DLRR_H2 determined at 2.9 Å resolution was consistent to the design model (Cα RMSD 0.9 Å), confirming the accuracy of the junction module design protocol ().

Step 3: Curvature specification by general module assembly

The crystal structures described thus far demonstrate that the building block modules (L22, L24, L28, and L29), junction modules (L22→L24, L24→L22, L24→L28, L28→L29, and L29→L28) and wedge modules (L24→L32→L24) all have structures that are very similar to the design models regardless of overall protein context. In principle, this enables design of combinations of modules to achieve a desired curvature. We represent the space of possible LRR structures as a network consisting of building block modules (nodes) connected by junction modules (edges) as in Figure 1b (). Any sequence of modules generated by following the edges in the network corresponds to an LRR structure with unique curvature. For example, all the 18,786 possible fusion structures consisting of 12 modules are depicted in as lines connecting the center of masses for each repeat module in the structure. The curvature diversity is orders of magnitude greater than that of the original LRRs containing the same number of building block modules.

Figure 1

(a) Overview of curvature-tunable scaffold design: idealized building block module design, junction module design, and general module assembly. (b) Module organization of natural LRR modules is represented by a network where nodes represent modules and edges transitions between modules. The size of nodes and the thickness of edges are proportional to the frequencies observed in the PDB. (c) Graphical representation of building block and junction modules. (d) Idealized building block module structures and sequences. The highly conserved residues are shown in sticks.

We chose to use models of the individual building block and junction modules extracted from the crystal structures described thus far in the general module assembly process rather than the original design models of these units. While the building block modules are similar to previously described structures, the designed junction modules have sequences () and structures () quite different from previously described LRRs. Because of the imperfect state of computational protein design, we consider the crystal structures (which differ from the design models by Cα RMSD 0.2–1.0 Å) to be more accurate representations of the structures these modules are likely to adopt in new designs ().

General module assembly and experimental characterization

As a proof-of-concept for general module assembly, we designed four multiple fusion constructs (DLRR_I, DLRR_J, DLRR_K, and DLRR_L in ). The designs contain more than two fusion interfaces which result in large super-helical structures comparable in size to TLR4[31] (PDB ID: 3FXI, 626 residues) and plant steroid receptor BRI1[32] (PDB ID: 3RIZ, 743 residues). The module organization and module origins for each design are shown in . Experimental characterization showed that the general module assembly protocol is quite robust. All of the multiple fusion designs were expressed in E. coli and found to be soluble, monomeric with well-defined CD spectra, cooperative unfolding transitions, and high thermal stability (). This is notable as all are quite large and complex proteins. We succeeded in solving the crystal structures of two of the designs as described in the following two paragraphs. Design DLRR_I contains two successive L32 wedge modules with multiple flanking N and C-terminal L24 modules. We solved the crystal structure of DLRR_I at 1.7 Å resolution (). Consistent with the assumption of context independent structure of the individual modules, the two L32 wedge modules in DLRR_I and the single L32 wedge module in DLRR_H2 are nearly identical over the backbone and core side chains (Cα RMSD 0.3–0.5 Å). Over the full 10 repeat unit structure, the crystal structure is closer to the model (Cα RMSD 0.5 Å, ) assembled from the crystal structures of the individual building block and junction modules extracted from DLRR_B and DLRR_H than to the model (Cα RMSD 1.7 Å, ) assembled from the design models of the individual modules (), supporting our decision to use the crystal structures of the building blocks rather than the original design models in the general module assembly calculations. Design DLRR_K consists of two L24 modules followed by the L32 module, three additional L24 modules, the L24→L28 junction module, and three {L28→L29} modules--a total of 15 repeat units. Such complexity of module organization is rarely if ever observed in naturally occurring LRRs. The protein is monomeric and stable, with a Tm of 75 °C. The crystal structure of DLRR_K at 2.8 Å resolution is very close to the general module assembly model (built from crystal structures of the individual modules from previous structures), with a Cα RMSD of 1.1 Å (). Taken together, these data suggest that general module assembly based on designed building block and junction modules can produce new structures with predefined shapes with high robustness and accuracy.

Discussion

We have described a general approach to creating repeat proteins with custom-designed shape through combination of designed building block and junction modules. The generation of scaffolds with defined curvatures using our computational approach is very likely simpler than that which occurred during the complex evolution of naturally occurring LRRs, and is considerably more controlled than what can be achieved in library selection approaches. The strategy allows the ready programming of a rich diversity of scaffolds with distinct curvatures: over 18,000 distinct 12 repeat unit structures can in principle be generated with our current set of building block and junction modules (). The stable and well-expressed DLRR_L design () has a complex organization with five different types of modules (19 repeat units) in total; for this length there are over 5,000,000 distinct possibilities with our current module set, and increasing the repertoire of idealized building block and junction module designs would enrich the curvature diversity still further. Our approach integrates protein structural analysis with energy driven design calculations to arrive at the idealized building block and junction modules, and computation and experiment to achieve high accuracy models of the complex repeat proteins generated by the module assembly process. While a completely energy driven approach would be preferable on aesthetic grounds, making use of information extracted from naturally occurring LRRs and from the crystal structures of idealized LRRs described in this study allows the generation of large families of LRR proteins with tunable curvatures to address current challenges. The critical role of computation in the overall process is illustrated by the junction modules: both the sequences () and the structures () of the designed junction modules differ considerably from their closest counterparts in naturally occurring LRRs and hence could not have been obtained without energy driven design calculations. These calculations are not perfect, however, and since the small differences between the design models and the corresponding crystal structures are amplified through lever arm effects when many modules are combined, we use crystal structures of the designed building block and junction modules in the general module assembly calculations rather than the original design models. The ability to custom design repeat proteins with well-defined shapes and curvatures has immediate application to the design of a next generation of high affinity binding proteins. Studies of native protein-protein interactions have shown that shape complementarity is a major determinant of protein binding affinity[33-36]. In particular, naturally occurring LRR-based binding proteins often achieve high affinity and specificity by having shapes closely conforming to the surfaces of the target proteins. The importance of this shape tuning for LRR protein molecular recognition is illustrated in for the naturally occurring LRR proteins internalin A (InlA) and ribonuclease inhibitor (RI). Each protein has a curvature adapted to its target (E-cadherin and ribonuclease A, respectively) resulting in well-packed complementary protein-protein interfaces with hot-spot clusters at both the N and C termini. In contrast, swapping the respective target for each of the LRR proteins, namely InlA:ribonuclease and RI:E-cadherin complexes, results in significant clashes and large gaps. With the capability provided by the approach described in this paper, it is now possible to design novel proteins with high backbone shape complementarity to essentially any macromolecular target of interest. Coupled with protein interface design methodology previously used to create new binding proteins based on already existing scaffolds[37,38], this should allow the design of high affinity and specificity binding proteins. Such an approach complements directed evolution methods[13,39,40] for obtaining high affinity binding proteins based on a single stable protein backbone which although powerful still require considerable effort. For creating a high affinity binding protein to a target of interest in the near future, a combination of our shape complementary scaffold design approach, protein-protein interface design for chemical complementarity, and limited directed evolution to optimize interactions not accurately described by computational design may prove particularly effective.

Methods

Markov transition model for natural LRR modules

To construct a Markov transition model for natural LRR modules, all set of two consecutive LRR modules were collected from the LRRML database[29] and labeled based on the module length. From these data, we computed the transition probability P = N/ΣN, where N represents the frequency of transitions from module length a to b in the PDB. In the network model in , the size of a node was scaled by frequency of a module length in the PDB, and the thickness of an edge by the transition probability.

Computational design of junction modules

The initial fusion models were generated by RosettaCM[24] from the motif-aligned scaffolds as described in the main text (), and refined with Rosetta relax protocol with coordinate constraints[41] to reduce perturbation of the structure. The fusion interface between the two heterogeneous building blocks was redesigned to improve structural compatibility using Rosetta FastRelax protocol. The protocol runs four cycles of repack, design and minimization, and during each cycle the weight for the repulsive energy term gradually increases to obtain well-packed and low-energy structure. During the design procedure, residue type constraints were added in order to favor original residue identities. After generating 1,000 design sequences, the top 10% of design sequences both by Rosetta energy and packing were retrieved and manually inspected to select the final sequences.

LRR structure modeling by iterative module assembly

Building block and junction module structures were extracted from the crystal structures of the designed LRR proteins containing one or two building block module types. Specifically, two-unit or three-unit module structures of Ncap-L22 (DLRR_A), L22-L22 (DLRR_A), Ncap-L24 (DLRR_B), L24-L24 (DLRR_B), L22→L24 (DLRR_B), L24→L22 (DLRR_E), L24→L28→L29 (DLRR_G3), L28→L29 (DLRR_G3), L29→L28 (DLRR_G3), and L24→L32→L24 (DLRR_H2) were used to elongate a LRR structure through module assembly mediated by the common flanking module. For example, module assembly of L22→L24 and L24→L22 though the common L24 unit generates the three-unit structure L22→L24→L22. The module assembly was then iteratively applied to elongate the overall structure one by one module, which resulted in the mature form of a general LRR structure. Finally, energy minimization with Rosetta was performed to eliminate potential structural defects. The crystal structure of L22→L24 was obtained from DLRR_B that has the fusion of L22-containing N-terminal capping domain (of internalin B) to L24. The L22→L28 was not used in the general module assembly due to the lack of the crystal structure.

Gene cloning, protein expression and purification

Genes encoding building block LRRs were synthesized and cloned into pET21_NESG (DLRR_A) or pET15_NESG (DLRR_B and DLRR_C) expression vectors by GeneScript. The gene fragments for each junction module were separately prepared by PCR assembly of six to eight 50–60 nucleotides oligos or by gene synthesis from Integrated DNA Technologies. Another gene fragment for the building block module to be fused was also obtained by PCR. The two gene fragments were then inserted into the plasmid of the appropriate building block protein by Gibson cloning[42]. The C-terminal 6xHis tag was added to all design sequences with Gly-Ser or Gly-Ser-Trp linkers where Trp was for measuring protein concentration easily. The proteins were expressed in E. coli BL21 Star (DE3) cells at 37°C for 4 hours after induction with 0.1 mM IPTG. The cell pellets were resuspended in 20 ml of lysis buffer containing 20 mM Tris, 500mM NaCl, 30 mM imidazole, and 5% v/v glycerol (pH 8.0). Roche complete EDTA-free protease inhibitor tablet, lysozyme (1mg/ml), and DNase (1mg/ml) were also added to the lysis buffer. After sonication, the proteins were purified with a Ni-NTA column and eluted with 20 mM Tris, 500 mM NaCl, 250 mM imidazole (pH 8.0). The proteins were further purified with Superdex 200 column (GE Healthcare) equilibrated in 20 mM Tris and 50 mM NaCl at pH 8.0. The soluble expression and purity were also tested with SDS-PAGE and mass spectrometry (LCQ Fleet Ion Trap Mass Spectrometer, Thermo Scientific).

Biophysical characterization

Circular Dichroism (CD) using AVIV 62S DA spectrometer was used to investigate secondary structure contents and thermal stability. Far-UV CD spectra from 200 nm to 260 nm were measured for the protein samples in 20 mM Tris and 50 mM NaCl (pH 8.0). Thermal denaturation experiment was also performed by following the minimum at 218 nm and increasing temperature from 25°C to 90°C. Size-Exclusion Chromatography coupled to Multiple Angle Light Scattering (SEC-MALS) was performed to access oligomeric state of protein samples. Superdex 200 10/300 GL column (GE Healthcare) was equilibrated in phosphate buffered saline (PBS) buffer and used on HPLC system (LC 1200 Series, Agilent Technologies) connected to miniDAWN TREOS static light scattering detector (Wyatt Technologies). The collected data were analyzed by ASTRA software (Wyatt Technology).

X-ray crystallography

Crystals of designed LRR repeat proteins were grown by standard vapour phase diffusion methods using a TTP labtech 'Mosquito' crystallization robot with 50 nanoliter drops of protein at concentrations ranging from 15 mg/mL to 40 mg/mL equilibrated against 100 volumes of microliter individual reservoir solutions. The reservoir compositions that produced each crystal are provided in Crystals were then flash-cooled by rapid emersion into artificial mother liquors corresponding to the crystallization reservoir solutions supplemented with either ethylene glycol (to 25% v/v) or with PEG3350 (to 35% w/v). Diffraction data were collected on cryocooled crystals using either an in-house CCD area detector with a rotating anode x-ray generator (DLRR_A, DLRR_G3, DLRR_H2, DLRR_K) or with a CCD area detector at the Advanced Light Source X-ray synchrotron facility (DLRR_E, DLRR_I). All data were processed and scaled using program suite HKL2000[43]. Molecular replacement was performed using program PHASER[44] with computational coordinates of the individual designs produced by Rosetta as search models. Model building was performed using COOT[45] and refinement was performed using program REFMAC[46].

Table 1

Data collection and refinement statistics (molecular replacement)

Crystal	DLRR_A	DLRR_E	DLRR_G3	DLRR_H2	DLRR_I	DLRR_K
Data collection
Space group	P2₁	P2₁2₁2₁	F222	P2₁2₁2₁	C2	P22₁2₁
Cell dimensions
a, b, c (Å)	57.66, 245.07, 57.73	32.12,77.71,101.89	91.13, 136.38, 161.74	89.78,96.50,136.36	109.49, 42.71, 67.82	36.87, 93.37, 126.24
α,β,υ(°)	90, 115.36, 90	90, 90, 90	90, 90, 90	90, 90, 90	90, 102.4, 90	90, 90, 90
Resolution (Å)	50(2.36)*	42.6(1.93)	23.5(2.53)	50(2.9)	50(1.73)	50(2.8)
R _sym	0.081(0.183)	0.063(0.171)	0.067(0.153)	0.092(0.529)	0.076(0.252)	0.192(0.742)
I/σI	24.0(6.4)	17.7(8.0)	15.5(4.1)	17.2(3.85)	33.7(4.5)	8.9(2.3)
Completeness (%)	96.7(83.7)	99.8(96.0)	98.1(85.5)	99.8(99.6)	96.3(83.7)	99.7(99.2)
Redundancy	5.7(3.0)	6.4(5.0)	4.5(1.9)	7.2(7.1)	10.3(2.3)	6.2(5.8)

Refinement
Resolution (Å)	50(2.36)	42.6(1.93)	23.5(2.53)	50(2.9)	50(1.73)	50(2.8)
No. reflections	34180	19993	17061	25484	31150	10729
R_work(%)	18.9(22.3)	15.86(17.50)	18.47(23.4)	21. 16(32.8)	17.07(21.50)	20.75(28.4)
R_free (%)	24.2(27.7)	22.38(23.7)	24.65(36.1)	25.15(48.5)	21.99(31.70)	28.53(36.0)
No. atoms
Protein	6771	2388	3456	7841	2577	3582
Ligand/ion	8	12	29	20	---	1
Water	230	106	96	1	199	18
B-factors
Protein	12.13	14.65	11.73	69.14	10.84	16.96
Ligand/ion	35.53	39.67	54.64	85.26	---	42.89
Water	18.0	35.39	21.91	50.44	25.43	18.60
r.m.s. deviations
Bond length (Å)	0.0137	0.0181	0.0138	0.0126	0.0194	0.0136
Bond angles (°)	1.661	1.810	1.629	1.651	2.052	1.475

Values in parentheses are for highest-resolution shell.

Table 2

Summary of fusion designs and experimental results

Design Name	Module organization[§]	Modules[¶] (repeat units)	designs tested	Soluble	Folded (CD)	Mono meric	X- ray	T_m (°C)	RMSD[1]	RMSD[2]
DLRR_A	Ncap–L22[6]	6 (6)	1	1	1	1	1	73	1.4 (0.8, 2.0)	0.4 (0.5, 1.0)
DLRR_B[Δ]	Ncap–L24[7]	7 (7)	1	1	1	1	1	78	1.7 (1.5, 2.9)	0.3 (0.6, 0.4)
DLRR_C	{L28→L29}[5]	5 (10)	5	5	1	#		71

DLRR_D	Ncap–L22[4]→L24[5]	9 (9)	1	1	1	1		87
DLRR_E	Ncap-L24[5]→L22[5]	10 (10)	1	1	1	1	1	77	2.1 (1.4, 2.0)	0.4 (0.7, 0.7)
DLRR_F	Ncap–L22[4]–JN_L22→L28→L29→{L28→L29}[3]	9 (13)	6	6	6	6		77
DLRR_G	Ncap–L24[5]–JN_L24→L28→L29→{L28→L29}[3]	10 (14)	6	6	6	6	1	81	2.6 (3.1, 3.8)	0.8 (0.8, 2.2)
DLRR_H	Ncap–L24[2]–JN_{L24→L32→L24}–L24[2]	5 (7)	4	4	4	2	1	65	0.9 (0.5, 1.0)	0.8 (0.7, 1.2)

DLRR_I	Ncap–L24[2]–JN_{L24→L32→L24}–JN_{L24→L32→L24}–L24[2]	6 (10)	1	1	1	1	1	53	1.7 (1.2, 2.3)	0.5 (0.5, 0.7)
DLRR_J	Ncap–L22[4]→L24[2]–JN_L24→L28→L29→{L28→L29}[2]	10 (13)	1	1	1	1		82
DLRR_K	Ncap–L24[2]–JN_{L24→L32→L24}–L24[3]– JN_L24→L28→L29→{L28→L29}[2]	10 (15)	1	1	1	1	1	75	†	1.1 (1.2, 3.9)
DLRR_L	Ncap–L22[3]→L24[3]–JN_{L24→L32→L24–}L24[3] JN_L24→L28→L29→{L28→L29}[2]	14 (19)	1	1	1	1		83

The superscripts represent the number of repeat units

Alternatively occurring two-unit {L28→L29} is considered as one module.

Experimental data of DLRR_B is from Parmeggiani et al.[22]

DLRR_C forms a dimer

Tm is estimated by calculating the infection point of the melting curve at 218 nm, and the highest Tm value is represented when multiple designs exist.

RMSD[1] : Cα RMSD (Å) between crystal structure and model generated from design models of building blocks and junction modules.

RMSD[2] : Cα RMSD (Å) between crystal structure and model generated using crystal structures of building block and junction modules (Supplementary Fig. 5c)

RMSD values for the first and the last unit in global structure alignment are provided in parenthesis.

Model of DLRR_K is generated by module assembly without an initial design model.

46 in total

Review 1. The leucine-rich repeat as a protein recognition motif.

Authors: B Kobe; A V Kajava
Journal: Curr Opin Struct Biol Date: 2001-12 Impact factor: 6.809

2. Macromolecular TLS refinement in REFMAC at moderate resolutions.

Authors: Martyn D Winn; Garib N Murshudov; Miroslav Z Papiz
Journal: Methods Enzymol Date: 2003 Impact factor: 1.600

3. Design of stable alpha-helical arrays from an idealized TPR motif.

Authors: Ewan R G Main; Yong Xiong; Melanie J Cocco; Luca D'Andrea; Lynne Regan
Journal: Structure Date: 2003-05 Impact factor: 5.006

4. A novel strategy to design binding molecules harnessing the modular nature of repeat proteins.

Authors: Patrik Forrer; Michael T Stumpp; H Kaspar Binz; Andreas Plückthun
Journal: FEBS Lett Date: 2003-03-27 Impact factor: 4.124

5. Structural principles of leucine-rich repeat (LRR) proteins.

Authors: Purevjav Enkhbayar; Masakatsu Kamiya; Mitsuru Osaki; Takeshi Matsumoto; Norio Matsushima
Journal: Proteins Date: 2004-02-15

6. A novel shape complementarity scoring function for protein-protein docking.

Authors: Rong Chen; Zhiping Weng
Journal: Proteins Date: 2003-05-15

7. Coot: model-building tools for molecular graphics.

Authors: Paul Emsley; Kevin Cowtan
Journal: Acta Crystallogr D Biol Crystallogr Date: 2004-11-26

Review 8. Engineering novel binding proteins from nonimmunoglobulin domains.

Authors: H Kaspar Binz; Patrick Amstutz; Andreas Plückthun
Journal: Nat Biotechnol Date: 2005-10 Impact factor: 54.908

9. The molecular structure of the Toll-like receptor 3 ligand-binding domain.

Authors: Jessica K Bell; Istvan Botos; Pamela R Hall; Janine Askins; Joseph Shiloach; David M Segal; David R Davies
Journal: Proc Natl Acad Sci U S A Date: 2005-07-25 Impact factor: 11.205

10. High-affinity binders selected from designed ankyrin repeat protein libraries.

Authors: H Kaspar Binz; Patrick Amstutz; Andreas Kohl; Michael T Stumpp; Christophe Briand; Patrik Forrer; Markus G Grütter; Andreas Plückthun
Journal: Nat Biotechnol Date: 2004-04-18 Impact factor: 54.908

28 in total

1. Designs on a curve.

Authors: J Fernando Bazan; Andrey V Kajava
Journal: Nat Struct Mol Biol Date: 2015-02 Impact factor: 15.369

2. Ambidextrous helical nanotubes from self-assembly of designed helical hairpin motifs.

Authors: Spencer A Hughes; Fengbin Wang; Shengyuan Wang; Mark A B Kreutzberger; Tomasz Osinski; Albina Orlova; Joseph S Wall; Xiaobing Zuo; Edward H Egelman; Vincent P Conticello
Journal: Proc Natl Acad Sci U S A Date: 2019-07-01 Impact factor: 11.205

Review 3. A designed repeat protein as an affinity capture reagent.

Authors: Elizabeth B Speltz; Rebecca S H Brown; Holly S Hajare; Christian Schlieker; Lynne Regan
Journal: Biochem Soc Trans Date: 2015-10 Impact factor: 5.407

4. Revealing aperiodic aspects of solenoid proteins from sequence information.

Authors: Thomas Hrabe; Lukasz Jaroszewski; Adam Godzik
Journal: Bioinformatics Date: 2016-06-09 Impact factor: 6.937

Review 5. The coming of age of de novo protein design.

Authors: Po-Ssu Huang; Scott E Boyken; David Baker
Journal: Nature Date: 2016-09-15 Impact factor: 49.962

Review 6. Hierarchical design of artificial proteins and complexes toward synthetic structural biology.

Authors: Ryoichi Arai
Journal: Biophys Rev Date: 2017-12-14

Review 7. Protein Engineering for Improving and Diversifying Natural Product Biosynthesis.

Authors: Chenyi Li; Ruihua Zhang; Jian Wang; Lauren Marie Wilson; Yajun Yan
Journal: Trends Biotechnol Date: 2020-01-15 Impact factor: 19.536

8. Structural characterization of Treponema pallidum Tp0225 reveals an unexpected leucine-rich repeat architecture.

Authors: Raghavendran Ramaswamy; Simon Houston; Bianca Loveless; Caroline E Cameron; Martin J Boulanger
Journal: Acta Crystallogr F Struct Biol Commun Date: 2019-06-26 Impact factor: 1.056

9. Computational design of a homotrimeric metalloprotein with a trisbipyridyl core.

Authors: Jeremy H Mills; William Sheffler; Maraia E Ener; Patrick J Almhjell; Gustav Oberdorfer; José Henrique Pereira; Fabio Parmeggiani; Banumathi Sankaran; Peter H Zwart; David Baker
Journal: Proc Natl Acad Sci U S A Date: 2016-12-08 Impact factor: 11.205

10. Structural Insights into VLR Fine Specificity for Blood Group Carbohydrates.

Authors: Bernard C Collins; Robin J Gunn; Tanya R McKitrick; Richard D Cummings; Max D Cooper; Brantley R Herrin; Ian A Wilson
Journal: Structure Date: 2017-10-05 Impact factor: 5.006