Literature DB >> 24758720

Automated structure refinement for a protein heterodimer complex using limited EPR spectroscopic data and a rigid-body docking algorithm: a three-dimensional model for an ankyrin-CDB3 complex.

Sarah J Edwards¹, Christopher W Moth, Sunghoon Kim, Suzanne Brandon, Zheng Zhou, Charles E Cobb, Eric J Hustedt, Albert H Beth, Jarrod A Smith, Terry P Lybrand.

Abstract

We report here specialized functions incorporated recently in the rigid-body docking software toolkit TagDock to utilize electron paramagnetic resonance derived (EPR-derived) interresidue distance measurements and spin-label accessibility data. The TagDock package extensions include a custom methanethiosulfonate spin label rotamer library to enable explicit, all-atom spin-label side-chain modeling and scripts to evaluate spin-label surface accessibility. These software enhancements enable us to better utilize the biophysical data routinely available from various spin-labeling experiments. To illustrate the power and utility of these tools, we report the refinement of an ankyrin:CDB3 complex model that exhibits much improved agreement with the EPR distance measurements, compared to model structures published previously.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24758720 PMCID： PMC4018176 DOI： 10.1021/jp4099705

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

Electron paramagnetic resonance (EPR) spectroscopy, when combined with detailed computer modeling, is a powerful and valuable tool for oligomeric protein complex structure determination and can be used to probe structural details for these protein assemblies in the many situations where X-ray crystallography or NMR spectroscopy are not practical or possible. Appropriate EPR experiments for spin-labeled protein complexes can provide distance measurements over a range of 5–80 Å,[1−4] making EPR an ideal technique to study the structures of multiprotein complexes. EPR studies also require markedly smaller sample amounts than NMR experiments or crystallization trials and can be applied routinely to samples in various physical environments, including cell membranes. EPR techniques may be particularly useful when protein complex formation involves highly flexible structures.[5] The measurements obtained from EPR double electron–electron resonance (DEER) experiments can be analyzed in terms of Gaussian distributions[3,4,6] of the distances between the unpaired electrons in the paramagnetic spin labels as given bywhere P is the relative probability of observing a particular distance R, R0 is the center of the Gaussian, and σ describes the width of the Gaussian. This distance distribution reflects an ensemble of conformations and is determined by the complex interplay of global protein conformation, spin-label flexibility, and spin-label interaction with neighboring side chains. As a result, interpretation of these distance distributions based simply on side-chain Cβ–Cβ distances may introduce errors as large as 12–14 Å.[7,8] However, DEER distance measurements are typically more precise.[6] Thus, it should be advantageous to model spin-label side chains explicitly during three-dimensional model construction or assessment. We have previously developed the TagDock software package to treat rigid body docking problems.[9] Here, we describe an enhanced version of the TagDock software suite, which now includes an MTSSL side-chain rotamer library to facilitate explicit spin label modeling in the protein docking calculations. To illustrate the utility of these TagDock enhancements, we present results for detailed structural refinement of an important human erythrocyte cytoskeletal complex formed between ankyrin-R and the cytosolic domain of anion exchange protein AE1, or band 3 (CDB3), using EPR pair distance measurements and spin-label accessibility data[10] to filter candidate complex models. Human erythrocytes are unique among cells in the body due to their unusual biconcave discoidal shape, high structural integrity, and elasticity. These unique properties are essential for long-term survival under the turbulent conditions often present in the circulatory system. These unique structural and mechanical properties are due to the extensive membrane skeleton, composed primarily of actin and spectrin, and the connections formed between the membrane skeleton and integral membrane proteins.[11−13] One major connection type is composed of ankyrin-R and protein 4.2, which form a complex with CDB3.[14−18] Defects in this connection complex, sometimes due to mutations in ankyrin-R, can result in spherical erythrocytes with reduced elasticity and diminished structural integrity.[19] These deformed erythrocytes are more easily ruptured, leading to a class of clinical conditions known collectively as hereditary spherocytosis, manifested by hemolytic anemia and associated problems. A detailed structure for the ankyrin-R:CDB3 complex would help us better understand how various ankyrin or CDB3 mutations alter the skeletal membrane connections and produce spherocytosis. However, as is the case with many heterodimeric protein complexes, there is no high-resolution crystal structure yet for ankyrin-R:CDB3 and we only have high-resolution structures for individual protein components of this connection network. We reported recently a detailed model for the complex formed between the cytoplasmic domain of erythrocyte band 3 (CDB3; PDB code 1HYN) and repeats 13–24 in the membrane binding domain of ankyrin-R (ankyrin-R; PDB code 1N11), using EPR distance measurements[10] to guide a combined automated molecular docking and manual molecular model building exercise. Here we describe a refined ankyrin-CDB3 complex model, generated using the TagDock toolkit, that exhibits much improved agreement with the biophysical data while retaining the general features reported for our previous model. The computational tools and protocols that we present here are applicable to any macromolecular dimer modeling task, when limited biophysical data such as distance measurements are available from EPR, fluorescence resonance energy transfer (FRET), solid-state NMR spectroscopy, or other biophysical techniques. Thus, the methods we describe should have broad practical utility.

Methods

New Developments in TagDock

As described previously,[9] TagDock is a software toolkit that produces structures for macromolecular complexes by generating randomly posed docked pairs (decoys), starting from rigid structures for each monomer. The program first generates all geometrically possible docking poses and then filters all docking solutions using a penalty function that determines the agreement for each pose with available biophysical data, most typically a set of intermonomer distance measurements obtained from, e.g., EPR, NMR, or FRET experiments. In the current work, we extend TagDock to include explicit MTSSL spin labels (Figure 1) at specified amino acid positions, subject to steric constraint considerations. With this option, TagDock computes interresidue distances as Boltzmann-weighted averages of internitroxide spin-label distances determined from all spin-label conformers that can be accommodated at each site. A histogram of distance–density, in 0.1 Å bins, can be created, for qualitative comparison to experimentally determined distance distributions.

Figure 1

Methanethiosulfonate spin-label with χ angles labeled.

Methanethiosulfonate spin-label with χ angles labeled. A rotamer library was constructed using molecular mechanics methods in AMBER v10,[20] to determine a Boltzmann-weighted set of sterically allowed MTSSL spin-label rotamers at each site. While there is considerable precedence for rotamer libraries,[21−24] a general-purpose library for docking calculations derived from molecular dynamics is quite distinct. We used molecular dynamics simulations to observe accessible rotamer conformations for surface-exposed MTSSL side chains in the context of a globular protein structure. We chose T4 lysozyme (T4L) as our model protein system. We modified the T4L structure (PDB entry 2OU8) by replacing native side chains at positions 5, 16, 38, 53, 65, 89, 109, 135, and 144 with MTSSL side chains. These positions were carefully selected such that (a) each position corresponds to a residue site that has been labeled successfully in previous experimental studies, so we know an MTSSL side chain can be accommodated; (b) the side chain was located at the surface of the protein, with the Cα–Cβ bond vector oriented away from the center of mass of the protein; and (c) no spin-label position was close enough to any other modified position such that any two MTSSL side chains could interact directly with each other and thus bias the conformational sampling, allowing us to run simulations with all nine labels included. We used previously published molecular mechanics force field parameters for MTSSL[25] to run these calculations. The χ3 dihedral centered on the disulfide linkage favors values of ±90° and is well-known to have a large energy barrier for interconversion between the two low-energy conformers. Therefore, we ran two simulations, one with χ3 set initially at +90°, and another with χ3 set at −90°, so as not to bias the rotamer states produced by our simulations due to the starting conformations for χ3. Both starting structures (χ3 = +90°; χ3 = −90°) were solvated in a truncated octahedral box of simple point charge (SPC) waters with a 12.2 Å solvent layer in all directions, producing a total system size of ∼42,300 atoms. The protein atoms were then held fixed and the solvent was relaxed with 100 steps of steepest descent minimization, followed by 1900 steps of conjugate gradient minimization. Next, the solvent was held fixed and the protein was minimized and finally, the whole system was minimized without restraints. The system was then heated to 298 K and equilibrated in a 310 ps procedure consisting of six separate molecular dynamics (MD) runs of between 10 and 80 ps each, using the Anderson temperature coupling option. At the beginning of the procedure, solute atoms were restrained to their starting positions with a force constant of 16 kcal/mol, and the MD step size was set to 0.5 fs. Restraints were iteratively decreased and the MD step size was iteratively increased in subsequent runs until the solute atoms were unrestrained and the step size was 1.5 fs. Constant volume dynamics were run for the first 70 ps, switching to constant pressure for the remainder of the procedure. The equilibrated χ3 = −90° and χ3 = +90° systems were then each simulated for 200 ns using constant pressure MD with isotropic position scaling. A trajectory snapshot was saved every 1 ps, yielding 400,000 individual snapshots for each of the nine MTSSL side chains. The resulting 3.6 million MTSSL conformations were extracted from the trajectories for subsequent conformational analysis. The observed values for each of the χ1–χ5 MTSSL dihedral angles over the two 200 ns MD simulations are depicted as polar probability plots in Figure 2. The values for the χ3 = −90° and χ3 = +90° simulations are overlaid in the right-hand panels, illustrating how the χ3 conformation impacts the probability of observing each of the three low-energy states for the other side-chain dihedral angles. The left panels illustrate the overall probabilities for each dihedral value, summed over both simulations. As these data show clearly, our observed populations for χ1–χ5 are nearly identical to similar analysis reported by Polyhach et al.,[22] using simulations of the free MTSSL spin label only. The gauche+ population for χ1 is more prevalent in Jeschke’s simulations compared to our results, but this is the only observable difference in the two studies. This difference is presumably because their simulations were performed for an isolated MTSSL residue while we simulated the spin labels in an intact protein. A gauche+ conformation at χ1 can incur steric clashes with the protein backbone, which is why this conformation is less prevalent in our simulations. Since these two libraries were derived using quite different simulation protocols, the overall excellent agreement between these two independent analyses lends an added measure of confidence to our rotamer libraries (i.e., the rotamer states observed do not appear to be strongly dependent on the exact conformational sampling procedures, potential functions, and so on).

Figure 2

Polar probability plots for χ1–χ5 rotamers of the MTSSL spin label, determined from molecular dynamics simulations. On the right side, probabilities for our simulations with χ3 initialized at +90° (red) and −90° (black) are superimposed. On the left side, the two are summed to give the probability for each dihedral taken from all snapshots of both simulations. Each of the 3.6 million MTSSL side-chain conformations captured during our MD simulations were classified into one of 162 possible rotamer states, defined by binning the measured values for the χ1, χ2, χ4, and χ5 dihedral angles (Figure 1) into g+ (0°–120°), g– (−120°–0°), or trans (120°–240°) conformations, and binning χ3 into ±90° conformations. The rotamer states were then counted and ranked by frequency of occurrence. Then, for each rotamer state, each of the five χ values were averaged over all of the conformers that were classified to that state and stored as the values that define that rotamer conformation. We observed 155 of the possible 162 states, with the most frequent state representing 5.5% of the total population (∼198,000 total observations), while the three least frequent rotamer states were observed only 4 times each. A cutoff was then applied such that all observed rotamer states with a probability density of at least 0.0005% were retained in our model, producing a library of the most frequently observed 146 unique rotamers (Supporting Information Table S1). We also used the same procedure to create rotamer libraries classified using only four dihedrals (χ1–χ4) or three dihedrals (χ1–χ3), which produced 51 and 18 rotamer states, respectively (Supporting Information Tables S2 and S3). The χ4 and χ5 dihedral angles that were not directly used in the classification of the three- and four-dihedral rotamer libraries are still reported as the corresponding averages in those libraries. These simplified rotamer libraries could be used to further enhance computational efficiency but were not used in this study. The rotamer library is input to TagDock as a simple text file, with each line specifying one of our observed MTSSL rotamers by listing its five χ dihedral angles and the probability density for that rotamer. In the present work, 146 rotamers, ranging in probability from 5.5% to 0.0005%, are input in this manner. Each MTSSL rotamer is inserted at each spin-label position in both isolated protein monomers prior to the initial docking step. TagDock then filters each rotamer candidate by applying one of four user-selectable algorithms. The user can elect to simply incorporate all 146 rotamer states at each position (i.e., no bump-checking), or else employ a bump-check algorithm to eliminate rotamers that form unfavorable steric interactions with either backbone (N, C, CA, O) atoms and/or neighboring side-chain heavy atoms. The bump-check algorithm itself is simple and efficient: when the distance between a pair of heavy atoms (i.e., non-hydrogen atoms) is less than 85% of the sum of their van der Waals radii,[22,26] a steric clash is noted and the spin-label rotamer candidate is eliminated on that basis. Rotamers that pass a backbone bump check, but fail when side-chain clashes are considered, can often be accommodated with minor rearrangements of neighboring protein side chains, and we use the SCWRL 4.0[27] software package to make these adjustments objectively and quickly. SCWRL 4.0 leverages a backbone-dependent rotamer library to globally sample a protein’s energetically accessible side-chain conformations. Though SCWRL 4.0 is most frequently utilized for in-silico mutagenesis experiments, it can also be used to repack side chains in response to user-defined, occlusive quasi-atom hard spheres. SCWRL 4.0 takes fixed protein backbone atom coordinates and generates new side-chain heavy atom coordinates, which globally minimize potential energy as calculated from a combination of an anisotropic hydrogen bonding term and van der Waals contacts. Sophisticated computational algorithms in SCWRL 4.0 efficiently manage the geometric and combinatorial challenges of rapid side-chain sampling. Even with large proteins, SCWRL 4.0 typically returns repacked structures within seconds. For each MTSSL rotamer state we consider, we generate a PDB file that contains fixed protein backbone coordinates and the MTSSL side-chain atoms defined as a collection of hard spheres. We then use SCWRL to globally repack all remaining protein side chains around this hard-sphere site. The resulting repacked protein structure is again bump-checked by TagDock, and all MTSSL rotamer states that pass this second side-chain atom bump-check step are added to the set of accessible rotamers associated with that spin-label position. Using this protocol to iterate over all MTSSL rotamer states that fail the initial side-chain bump check, TagDock identifies the largest plausible set of MTTSL rotamer conformations at each spin-label position. All MTSSL rotamer conformations selected for each monomer partner are retained in all subsequent docking trials, with no further adjustment or reevaluation during the docking pose calculations. For each decoy pose generated, interresidue distances are calculated for all accepted spin-label rotamer states at each position, using the N–O bond vector midpoint for each spin-label residue. This calculation yields a probability-weighted distance distribution for each residue pair (i.e., a Boltzmann-weighted distance distribution, since our rotamer state populations are derived from long MD trajectories). We emphasize that these distance distributions will not necessarily reproduce the fine detail of an experimental DEER distribution profile, as we do not at present include local backbone dynamics in these calculations, but backbone motion will make a significant contribution to the distribution profile in some cases. We have shown in previous studies for a large number of test examples[9] that we can almost always obtain good docking results without inclusion of backbone dynamics, provided we generate a sufficiently large ensemble of docking poses during the coarse-docking phase so that the intermonomer distances populate the full distance distribution ranges observed in the DEER experiments.

Scoring Function

Structure refinement protocols that utilize distance data require a metric that can efficiently and objectively evaluate structures, both for ranking the relative quality of structures (i.e., how well do the structures fit the experimental data) and for automatic filtering of large numbers of structures (i.e., data reduction via automatic elimination of structures that are inconsistent with the experimental data). Each experimental distance restraint is typically represented as a distance distribution or possibly a distance range, rather than a discrete distance value, so we need a scoring function that is designed specifically to handle these types of distance restraint data. The distance data generated in NMR NOE experiments likewise are represented as distance ranges, since the NMR experiments also sample an ensemble of conformations that each have unique values for any specific interatomic distance. The harmonic penalty function we used previously for filtering structures is not well suited to these types of distance data sets, since a harmonic function “emphasizes” the median distance value, while penalizing distances that deviate from the median but still fall within the experimental distance range. Therefore, we have implemented a flat-bottomed harmonic scoring function, similar to those used widely by the NMR structure refinement community.[20,28−30] Any distance that falls within the allowable range specified by the “flat-bottomed” region of this function incurs no penalty. This scoring function has the formwhere N is the number of individual restraints and v is the individual penalty score,where r is an experimental distance, r is the corresponding distance in the decoy being scored, σ is the associated width of the distance distribution, and hrange is the width of the range over which the scoring function is harmonic. The default value we use for hrange is 4 Å, but it is a user-tunable parameter. Outside of the range defined in eq 3, (r – σ – hrange) < r < (r + σ + hrange), the scoring function becomes linear with a slope equal to that at the point (r + σ + hrange) or (r – σ – hrange), illustrated in Figure 3. The utility of this function for structure selection and/or refinement using experimental distance range data is well-documented.[28−30]

Figure 3

Illustration of the flat-bottomed harmonic scoring function with r = 30 Å, σ = 1 Å, and hrange = 4 Å. The flat-bottomed harmonic score is calculated as 0 for r within σ of r, harmonic between σ and σ + hrange from r, and linear outside this region. The value of 1 Å for σ is similar to the experimental values of σ found in Supporting Informaton Table S4. We have also implemented a feature to compute nitroxide spin-label surface accessibility at labeled positions. This option is accomplished with a simple script that exports a PDB structure file and then uses an external program to calculate solvent accessibility. We use the MSMS program[31] for this operation, but the user can specify any suitable surface area calculation software for this calculation via simple modification of the control script. As with inter-residue distance measurements, residue surface accessibility data are not used as restraints during the docking procedure, but rather used in the filtering step to determine which docking poses are consistent with the experimental accessibility data.

Results

Using the published experimental distances and distributions[32] (Table S4, Supporting Information), we first generated structures for the CDB3 homodimer. We followed the computational protocol reported in previous test calculations.[9] Ideally, we would begin docking calculations with the crystal structures for the independent protein monomers, but there are neither X-ray nor solution-phase structures for CDB3 monomer. Since the individual CDB3 monomer chains in the homodimer crystal have similar, but unique, conformations, designated P and Q, a docking calculation that begins with these unique structures would clearly be biased and would not provide a meaningful assessment of program performance. We therefore performed a 10 ns molecular dynamics simulation with a Generalized Born continuum solvent model using AMBER.[20] We applied weak positional restraints to maintain backbone atoms near the crystallographic coordinates for P and Q monomers, while allowing side-chain atoms to move freely, and then used the final configurations for P and Q monomers from the dynamics trajectory to begin docking calculations. First, a coarse-resolution search was performed. The second molecule of the dimer complex was randomly and repeatedly rotated and translated onto the surface of a large virtual sphere centered on the first molecule. Next, the shortest intermonomer Cβ distance was calculated, and the second molecule was translated toward the first molecule by the distance, with no specific tests for unreasonable physical overlaps at this stage. We generated 5 × 105 random docking poses and then selected the 200 poses that best satisfied the experimental distance measurements. We next performed higher resolution docking searches, using these 200 poses as starting structures. The high-resolution docking phase entailed a series of more focused docking attempts, restrained by progressively smaller translation and rotation angle values, all driven by a Monte Carlo optimization procedure. During the first phase of high-resolution docking, translation moves were restricted to a range [0.0–3.0 Å] and rotation about each axis to angle values from the range of 0.0–15.0°. These search ranges were tightened, or “focused” during subsequent iterations until the translation moves were restricted to the range of 0.0–1.0 Å and rotations to the range of 0.0–1.0°. We retained the 200 top docking poses, i.e., those poses that gave the best agreement with the experimental distance measurements, for final analysis. We performed two sets of test calculations: one with simple Cβ–Cβ distance filtering reported previously,[9] and a second calculation using explicit all-atom spin labels and our new rotamer library (together with the SCWRL option as described in Methods). In both cases, we generated 500,000 decoys using individual monomers from the homodimer crystal structure and performed focused refinement on the best 200 decoys, scoring with the flat-bottom harmonic function. We then plotted the penalty score for each of the 200 refined docking poses versus the Cα root mean square deviation (RMSD) for that pose relative to the homodimer crystal structure, as shown in Figure 4. The plot shows that both filtering strategies yield two sets of docking poses that cluster tightly around two different penalty score values (we should note that there is no statistically significant difference between the two scores for either filtering method). However, the filtering calculation based on explicit spin-label side-chain modeling also produced a third penalty score cluster with smaller RMSD values relative to the crystal structure. Specifically, filtering based on simple Cβ distances yields an optimal pose with a 4.0 Å RMSD relative to the crystal structure, while the optimal pose we generate when we use explicit spin-label side chains and the rotamer library for conformational sampling has a 2.2 Å RMSD relative to the crystal structure, as shown in Figure 5. We should note that the Cβ filtering results reported here are based on the new flat-bottom harmonic scoring function, so the results differ slightly from those reported previously.[9]

Figure 4

Figure 5

Cα backbone stereoplots for TagDock results shown in Figure 4, overlaid on the crystal structure of the CDB3 homodimer (gray, PDB 1HYN). Panel A displays the best structure (red) from the Cβ distance filtering method (Cα RMSD ∼ 4.0 Å). Panel B shows the best structure (blue) from the explicit spin-label rotamer sampling method (Cα RMSD ∼ 2.2 Å).

Cα RMSD with respect to the crystal structure of CDB3 for the top 200 TagDock results from the rigid Cβ algorithm previously published[9] (red) and incorporating the MTSSL rotamer library (blue). In both cases, we use the flat-bottom harmonic scoring function. We see similar degeneracy for both algorithms; however, the incorporation of the spin-label rotamer library yields a third, lower RMSD cluster. In both cases we used all experimental distances and distributions from Zhou et al.[32] Cα backbone stereoplots for TagDock results shown in Figure 4, overlaid on the crystal structure of the CDB3 homodimer (gray, PDB 1HYN). Panel A displays the best structure (red) from the Cβ distance filtering method (Cα RMSD ∼ 4.0 Å). Panel B shows the best structure (blue) from the explicit spin-label rotamer sampling method (Cα RMSD ∼ 2.2 Å). We next generated structures for the ankyrin-CDB3 complex, starting with crystal structures for the isolated ankyrin-R fragment (repeats 13–24)[33] (PDB code 1N11) and CDB3[34] (PDB code 1HYN), and used both the side-chain Cβ approximation and explicit spin-label side-chain modeling for distance measurements. We used the experimental distances and distributions published previously[10] to filter results from both calculations with our new flat-bottomed harmonic scoring function. In Figure 6, we show the docking pose closest to the geometric mean of the top 200 candidates for each filtering protocol, as well as the structure closest to the geometric mean of the top 30 poses generated previously with RosettaDock and manual modeling building.[10] The top 30 models reported previously have scores ranging from 285 to 470, while the geometric mean score is 53 for the Cβ–Cβ method and 22 for the explicit spin-label rotamer method, where lower scores indicate better agreement with the experimental distance data.

Figure 6

CDB3 dimers docked to ankyrin. Three structures are compared, with the ankyrin subunits (dark gray) superimposed. In gray, we show the docking result that is closest to the mean from a previously published RosettaDock calculation after applying a nonautomated filtering procedure.[10] In red, we show the closest to mean TagDock result from the rigid Cβ algorithm. In blue, we show the TagDock result using the MTSSL rotamer library. In all cases, we used the 20 experimental distances reported previously[10] to filter our docking pose solutions. We also generated CDB3-ankyrin complex structures using subsets of the DEER distance data, to assess solution “convergence” as a function of total interresidue distances used in the filtering step. Each spin-label position in one partner that had distances measured to multiple positions in the other partner became a restraint set; e.g., CDB3 70–ANK 707 and CDB3 70–ANK 722 is one distance subset of the full data set. Table 1 shows the Cα RMSD for the structure closest to the geometric mean of the top 200 refined TagDock poses computed with various distance data subsets, compared to the structure closest to the geometric mean of the top 200 refined TagDock poses generated when all experimental distances are used for solution filtering. In some cases, a distance subset contains as few as two interresidue distances for filtering, so it is not surprising that the Cα RMSD values for these subsets are quite large. For the limited-distance data subsets only, we also computed spin-label solvent accessibility for each docking pose and compared these results to the experimental accessibility measurements[10] as an additional filter, to test if solvent accessible surface area data can be used effectively to improve the filtering process. When we used the solvent accessibility filter together with limited-distance data subsets to eliminate docking poses that were inconsistent with the experimental data, we observed improved Cα RMSD values for the mean structures in most cases, as well as dramatically smaller pairwise Cα RMSD values for all remaining poses that passed both the distance and accessibility filters (i.e., the docking pose solution set is more tightly clustered about a mean structure), suggesting solution convergence within the limits of the available experimental data.

Table 1

Pairwise Cα RMSD Results (Å) for Subsets of Our Full Distance Restraint Data Set, with and without Solvent Accessible Surface Area (SASA) Filteringa

position	no. of restraints	distance	distance + SASA
CDB3 70	2	17.3 (18.5)	18.5 (14.2)
CDB3 130	5	2.9 (17.8)	6.2 (13.4)
CDB3 166	4	19.9 (15.7)	18.8 (14.0)
CDB3 254	3	16.4 (15.2)	10 (13.6)
CDB3 302	2	24.4 (23.4)	9.9 (14.0)
ANK 509	2	23.9 (19.2)	18.5 (16.8)
ANK 524	3	27.3 (18.9)	10.8 (11.4)
ANK 608	3	3.7 (15.5)	3.9 (14.9)
ANK 623	4	20.1 (18.1)	11.1 (14.3)
ANK 707	2	12.5 (18.5)	3 (15.3)
ANK 722	2	16 (19.8)	9 (13.0)

The closest structure to the mean of each ensemble is compared to the reference structure obtained by refining the CDB3-ankyrin structure with all available restraints. The average pairwise RMSD for each ensemble is shown in parentheses. For reference, the pairwise Cα RMSD for the ensemble generated when all experimental distances are used is ∼0.4 Å. In most cases, reducing the total number of distances used for the filtering step yields docking poses that deviate significantly from the results obtained when all 20 distances are used for filtering, and the structural variation among the top 200 docking poses is considerable. When SASA data[10] are included in the filtering step (column 4), the docking poses obtained are generally much closer to the solution generated with the full 20-distance restraint set, illustrating the potential utility of these data to help select the best docking models.

Discussion

We emphasize that the goal of our refinement procedure is not to precisely match a given distance, nor should it be. The results obtained in the EPR experiments are interpreted as Gaussian distributions of the distances from an ensemble of structures, and this fact greatly influenced our choice of a flat-bottomed harmonic scoring function. This function does not “penalize” any structure that falls within the Gaussian distribution, unlike a simple harmonic function that favors the mean value of the distribution. We do not expect a single structure to simultaneously match the center of the Gaussians for all 20 distances measured for the ankyrin-CDB3 complex, but we do expect that a small ensemble of structures will reproduce the experimental distance distributions, as has been reported in a previous study of a synaptotagmin-SNARE complex by Lai et al.[35] In our previous study with a large database of protein heterodimer complexes, we rarely observed any docking models with low penalty scores that exhibit any notable distance violations.[9] In only a few cases have we obtained a docking solution with a good penalty score, i.e., excellent overall agreement with experimental distance data, which nonetheless displayed one pair distance that deviated significantly from the experimental value. For one of these cases, the pair-distance discrepancy was due to an experimental error, and the experimental result was subsequently retracted. In the remaining cases, we strongly suspect the isolated pair-distance discrepancies are likely due to our rigid backbone docking protocol. Proteins are inherently flexible, and at a minimum, minor structural fluctuations are to be expected. Our earlier studies with a large data set of protein heterodimer complexes suggests that we can generally obtain good docking models without explicit inclusion of backbone flexibility.[9] However, in cases where large conformational changes occur during complex formation, it will likely be important to include the impact of local backbone flexibility on spin-label distance distributions to achieve better agreement with EPR DEER experiments. It was not necessary to include backbone flexibility for the CDB3 homodimer docking calculations reported here because of the computational protocol we used to generate the isolated CDB3 monomer structures. The large size and inherent flexibility of the MTSSL spin labels can also produce local structural fluctuations that could easily alter simulated distances by several angstroms, and this was our primary motivation to incorporate an MTSSL rotamer library for explicit side-chain conformational sampling in the TagDock toolkit. Our results for the CDB3 homodimer show clearly that explicit consideration of MTSSL side-chain conformations during the solution filtering step can yield much better docking poses than a simple Cβ–Cβ distance assessment, in this case, CDB3 homodimer models that are considerably closer to the reference crystal structure. We emphasize that the MTSSL rotamer library must be coupled with an efficient method to repack protein side chains adjacent to the spin-label site, and we use the SCRWL application for this task. At some residue positions, we do not identify any allowable spin-label side-chain rotamers until we have performed extensive repacking for all neighboring amino acid side chains. Failure to identify and consider all possible spin-label rotamers at each labeled site will almost certainly reduce the number of candidate docking poses that pass the experimental distance distribution filtering step, and lead to the inappropriate exclusion of viable docking pose solutions. The TagDock toolkit is not intrinsically limited to experimental distance measurements for solution filtering.[9] Other metrics, such as side-chain solvent accessible surface area, can be used for the filtering step as well. Our ankyrin:CDB3 docking results, using limited distance measurements combined with spin-label solvent accessibility measurements, illustrate the potential power and effectiveness of combined filtering metrics. When solvent accessibility data are used together with limited distance measurements for filtering, we achieve in many cases a docking solution ensemble that is closer to the result obtained with the full distance restraint data set than we could achieve with limited distance data alone. This result is significant, because distance measurement based on site-directed spin labeling is a labor-intensive procedure, and any strategy that might yield comparably good docking results with fewer experimental measurements has obvious appeal. Of course, solvent accessibility is a useful filtering metric only when the specific residue positions are impacted by the protein docking interface. Otherwise, the solvent accessibility profiles provide little or no additional useful information. The refined ankyrin-R:CBD3 complex reported here exhibits much better agreement with the DEER distance and spin-label accessibility measurements, compared to our earlier model.[10] These improved results are directly attributable to the TagDock toolkit design and implementation. Since TagDock calculations are fully automated and require no user intervention for model generation or scoring, we avoid potential biases that might be introduced with any manual model building or model selection procedures, unlike our previously published work. Furthermore, since TagDock exhaustively considers all geometrically possible docking poses during the initial docking step, we are far more likely to identify all solutions that satisfy the experimental restraints. As reported previously,[9] solutions sets typically converge to a well-defined consensus structure as additional experimental restraints are added in the filtering step, providing a convenient metric to determine the optimal solution (within the limits of the experimental data used for the filtering process). An alternate model reported previously for the ankyrin-CDB3 complex[36] places CDB3 more directly along the concave surface region of the ankyrin-R fragment. While our automated docking protocol sampled this region extensively, it did not identify any docking poses in the region that provide good agreement with the DEER distances or solvent accessibility measurements. We should note that our previous RosettaDock calculations,[10] when performed without distance restraints, did generate docking poses in the ankyrin-R concave surface region that scored well, but these poses failed to satisfy the DEER distance restraints.

Conclusion

Using a combination of EPR distance measurements and rigid docking simulations, we have defined a fast and efficient method for determining and refining the structure of a complex between two proteins based on EPR distance data. For our specific test cases, we studied the CDB3 homodimer and the complex between CDB3 and ankyrin-R. We show that a small distance restraint data set such as that used in this study is sufficient to reliably dock two rigid proteins. Furthermore, sampling spin-label conformations with our new custom MTSSL rotamer library improves agreement with the crystal structure of the CDB3 homodimer complex, compared to calculations based on a simple Cβ–Cβ distance approximation. In cases where one or more of the individual protein partners undergoes significant conformational changes, this strategy will likely not be sufficient to refine an atomic resolution model. In these cases, a simple rigid body docking strategy to generate initial model candidates will likely be inadequate, and efficient strategies for flexible protein docking calculations will be needed. We are currently evaluating several computational schemes for this purpose, using structurally well-characterized protein–protein complexes as test systems. Local protein backbone conformational sampling will be crucial for these situations, and we are testing several modified MD simulation methods that are more computationally efficient than traditional equilibrium MD simulations, such as the accelerated MD method of Hamelberg et al.[37] There are of course many good protein docking programs available now, as we reviewed in a previous publication.[9] However, there are only a few protein docking programs or tools that model spin-label side chains explicitly when utilizing EPR data. RosettaEPR[38] has this capability, as does the MtsslSuite[39] available in the PyMol graphics package.[40] However, the spin-label rotamer libraries employed in these tools are derived primarily from existing crystal structures. At the moment, there are not enough crystal structures with MTSSL side chains to derive rotamer libraries that yield the same statistical confidence levels as, e.g., the SCRWL amino acid rotamer libraries.[27] As discussed above, this was our primary motivation to develop a Boltzmann-weighted MTSSL rotamer library from extensive simulation data. The MMM software package[22] is the only other tool that utilizes an MTSSL rotamer library similar to ours and that can incorporate explicit spin-label rotamer sampling along with sampling of neighboring side-chain rotamers, in protein docking calculations. However, TagDock differs substantially from all other protein docking programs in that it incorporates automated distance-difference matrix analysis to analyze results and suggest additional experimental measurements.[9,41] This capability, coupled with the computational performance (TagDock can generate ∼1 × 106 docking poses in 1 h on a single-processor Linux workstation), makes TagDock particularly useful for rapid docking model generation and assessment.[9] For these reasons, we promote the package primarily as a tool to facilitate experiment planning, rather than a replacement for other capable protein docking programs. Of course, in cases where the rigid body docking protocol is reasonable (i.e., cases where the individual monomers do not undergo significant conformational changes during complex formation), TagDock is also an effective and efficient protein docking program.[9] The software extensions reported here that enable us to better utilize EPR distance data further enhance TagDock’s utility as an experiment planning tool. The results presented in this work and in an earlier publication[10] for the ankyrin-CDB3 complex demonstrate clearly that detailed, chemically reasonable three-dimensional structural models can be generated for protein–protein complexes using a small number of accurate, long-range distance measurements and a relatively simple, inexpensive computational refinement strategy, provided there is sufficient information to define the internal three-dimensional structures for the individual protein partners used as starting structures for the automated docking calculations. The experimental data used in this study for solution filtering (DEER distance and spin-label solvent accessibility measurements) can be generated for many other protein complexes, and the TagDock toolkit can be used for any oligomer docking project where reliable three-dimensional structures are available for the individual monomers, so the procedures and strategy presented here should be widely applicable for many biomolecular oligomer structure refinement problems.

37 in total

1. Crystallographic structure and functional interpretation of the cytoplasmic domain of erythrocyte membrane band 3.

Authors: D Zhang; A Kiyatkin; J T Bolin; P S Low
Journal: Blood Date: 2000-11-01 Impact factor: 22.113

Review 2. Spectrin and ankyrin-based pathways: metazoan inventions for integrating cells into tissues.

Authors: V Bennett; A J Baines
Journal: Physiol Rev Date: 2001-07 Impact factor: 37.312

3. Measurement of large distances in biomolecules using double-quantum filtered refocused electron spin-echoes.

Authors: Peter P Borbat; Jared H Davis; Samuel E Butcher; Jack H Freed
Journal: J Am Chem Soc Date: 2004-06-30 Impact factor: 15.419

Review 4. Ankyrin-based patterning of membrane microdomains: new insights into a novel class of cardiovascular diseases.

Authors: Kent R Nilsson; Vann Bennett
Journal: J Cardiovasc Pharmacol Date: 2009-08 Impact factor: 3.105

5. Restriction by ankyrin of band 3 rotational mobility in human erythrocyte membranes and reconstituted lipid vesicles.

Authors: A Che; I E Morrison; R Pan; R J Cherry
Journal: Biochemistry Date: 1997-08-05 Impact factor: 3.162

Review 6. Molecular basis of red cell membrane disorders.

Authors: Jean Delaunay
Journal: Acta Haematol Date: 2002 Impact factor: 2.195

7. Synaptotagmin 1 and SNAREs form a complex that is structurally heterogeneous.

Authors: Alex L Lai; Hao Huang; Dawn Z Herrick; Natalie Epp; David S Cafiso
Journal: J Mol Biol Date: 2010-11-16 Impact factor: 5.469

8. Identification of contact sites between ankyrin and band 3 in the human erythrocyte membrane.

Authors: Jesse L Grey; Gayani C Kodippili; Katya Simon; Philip S Low
Journal: Biochemistry Date: 2012-08-14 Impact factor: 3.162

9. On the origin of large flexibility of P-glycoprotein in the inward-facing state.

Authors: Po-Chao Wen; Brandy Verhalen; Stephan Wilkens; Hassane S Mchaourab; Emad Tajkhorshid
Journal: J Biol Chem Date: 2013-05-08 Impact factor: 5.157

10. RosettaEPR: rotamer library for spin label structure and dynamics.

Authors: Nathan S Alexander; Richard A Stein; Hanane A Koteiche; Kristian W Kaufmann; Hassane S McHaourab; Jens Meiler
Journal: PLoS One Date: 2013-09-05 Impact factor: 3.240

3 in total

1. Characterization of the Domain Orientations of E. coli 5'-Nucleotidase by Fitting an Ensemble of Conformers to DEER Distance Distributions.

Authors: Ulrike Krug; Nathan S Alexander; Richard A Stein; Antje Keim; Hassane S Mchaourab; Norbert Sträter; Jens Meiler
Journal: Structure Date: 2015-12-24 Impact factor: 5.006

2. Rapid Simulation of Unprocessed DEER Decay Data for Protein Fold Prediction.

Authors: Diego Del Alamo; Maxx H Tessmer; Richard A Stein; Jimmy B Feix; Hassane S Mchaourab; Jens Meiler
Journal: Biophys J Date: 2019-12-18 Impact factor: 4.033

3. Protein functional dynamics from the rigorous global analysis of DEER data: Conditions, components, and conformations.

Authors: Eric J Hustedt; Richard A Stein; Hassane S Mchaourab
Journal: J Gen Physiol Date: 2021-09-16 Impact factor: 4.086

3 in total