Literature DB >> 21918618

Classification of heterodimer interfaces using docking models and construction of scoring functions for the complex structure prediction.

Yuko Tsuchiya1, Eiji Kanamori, Haruki Nakamura, Kengo Kinoshita.   

Abstract

Protein-protein docking simulations can provide the predicted complex structural models. In a docking simulation, several putative structural models are selected by scoring functions from an ensemble of many complex models. Scoring functions based on statistical analyses of heterodimers are usually designed to select the complex model with the most abundant interaction mode found among the known complexes, as the correct model. However, because the formation schemes of heterodimers are extremely diverse, a single scoring function does not seem to be sufficient to describe the fitness of the predicted models other than the most abundant interaction mode. Thus, it is necessary to classify the heterodimers in terms of their individual interaction modes, and then to construct multiple scoring functions for each heterodimer type. In this study, we constructed the classification method of heterodimers based on the discriminative characters between near-native and decoy models, which were found in the comparison of the interfaces in terms of the complementarities for the hydrophobicity, the electrostatic potential and the shape. Consequently, we found four heterodimer clusters, and then constructed the multiple scoring functions, each of which was optimized for each cluster. Our multiple scoring functions were applied to the predictions in the unbound docking.

Entities:  

Keywords:  CAPRI; classification of heterodimers; prediction of complex structures; protein-protein docking; scoring functions

Year:  2009        PMID: 21918618      PMCID: PMC3169947          DOI: 10.2147/aabc.s6347

Source DB:  PubMed          Journal:  Adv Appl Bioinform Chem        ISSN: 1178-6949


Introduction

Many biological functions of proteins occur through specific recognition among protein molecules. Knowledge of protein–protein interactions, particularly three-dimensional structural information of protein–protein complexes, is crucial for understanding the biochemical and physiological functions of proteins.1–3 Recently, the number of tertiary structures of protein complexes has been increasing by the efforts of structure biologists; however, it is still smaller than that of known protein–protein interactions.4–6 Therefore, the precise prediction of protein complex structures is required for further experimental studies. A protein–protein docking simulation is one of the popular approaches to predict protein complex structures.7–9 Docking procedures generally consist of two main steps, a sampling step and a subsequent scoring step. A large number of complex models are generated in the former step. The problem of searching the high dimensional conformational space to create a collection of complex models was studied by various research groups.10–19 However, there are still several issues to overcome, such as the introduction of conformational flexibility in the generation of near-native models for targets with large conformational changes.9,20,21 In the latter step, the selection of near-native models is achieved with a scoring function from the many complex models generated in the former step. The various scoring functions that are presently available evaluate complex models in terms of the surface complementarity22,23 along with the electrostatic filter,10,11,24–26 the atomic contact energy (ACE)27 or the statistical potentials based on the pairs of interacting residues,28–30 including hydrogen bonds and van der Waals interactions. However, the selection of correct solutions is not easily performed in the structure predictions of many different heterodimers.9,21 As previous studies have pointed out,1–3 various types of heterodimer complexes exist not only in biological functions and three-dimensional structures, but also interaction modes. For example, there are heterodimers with electrostatic dominant interfaces, those with hydrophobic dominant interfaces, and those without interfaces but with high or low shape complementarity. In contrast, the scoring functions based on the statistical analysis of heterodimer interactions are usually designed to select the complex models with the most abundant interaction mode in the known complexes, and thus a single scoring function will not be enough to evaluate the diverse protein–protein interfaces. In addition, the identification of the interaction modes, ie, the classification of heterodimer complexes, was usually performed based on the interface characters observed in experimentally determined structures of heterodimers. However, to make a native dimer structure, the information about the difference between noninteracting sites and interacting sites will be more important because even a weak interface can be a native interface if no other better interfaces exist. Several pioneering works have already proposed the multiple scoring functions optimized for each type of protein function.10,31–33 However, they focused only on two types: enzyme-inhibitor and antibody–antigen type complexes. The other heterodimers, such as those related to signal transduction and gene transcription and translation, were classified as other types.32,34 This is probably because the small numbers of known complex structures make it difficult to find the functional similarities between these heterodimers and to categorize them. Thus, the classification of heterodimers by using information other than that of protein functions will facilitate the construction of the multiple scoring functions. In this study, we addressed the problem of selecting the correct solutions from the many complex models in the scoring step, by considering the various features of the heterodimers. First, we classified the native interacting sites by considering decoy structures, where the search for the parameters of the scoring functions to discriminate the near-native and the decoy models was carried out. As a scoring function, we used a linear combination of the weighted values of three complementarity scores for the hydrophobicity, the electrostatic potential, and the shape at the protein–protein interface.35 This function indicates the total degree of complementarities for the three surface features over the interfaces. The four heterodimer clusters were found according to our classification scheme. Four scoring functions were then constructed as multiple scoring functions where each function was optimized for each heterodimer type.

Materials and methods

Training dataset

Native heterodimer complexes

The X-ray crystal structures of heterodimers, according to the biological units described in the header of the Protein Data Bank (PDB),36 which have 2.5 Å or better resolution and consist of two protein chains with more than 30 residues and a sequence identity lower than 85% by FASTA program,37 were extracted from the PDB in April 2006. Among these structures, 122 representative heterodimers from each SCOP family class38 were finally selected. These entries are listed in Supplementary Tables 1 and 2. We referred to these experimentally determined complexes as the native complexes.

Complex models generated by the sampling method

Up to 500 models for each heterodimer entry were generated by using our sampling method39 in the bound–bound docking where the structures of two protomers derived from the complex structure were used. This method generates complex models by optimizing an objective function, which evaluates the shape complementarity of the molecular surfaces of two component protomers by evaluating the angle of the normal vectors at the vertices on their molecular surfaces, and the sequence conservation of the surface residues calculated by the evolutionary trace (ET) analysis,40 when required. The sequence conservation information was not used for generating the complex models in this section, because there are the case where such information is not effective in indentifying the interacting region, and the case where a sufficient number of homologous sequences cannot be obtained to calculate the sequence conservation.39 However, we used conservation information to construct one of the two test datasets, as described in the next section. The optimization of the objective function was accomplished by using a genetic algorithm in combination with Monte Carlo sampling. The final models were selected so that each model had a ligand-rmsd (L-rmsd) larger than 3.0 Å from any other models. Note that the smaller protomer in a complex structure is referred to as a ligand protein, and the larger protomer as a receptor protein. The rmsd is the root mean square deviation of one structure from another structure. The ligand-rmsd is the rmsd between the ligand proteins in two complex models when the receptor proteins are superimposed. Since we could not obtain any correct solutions, in other words, near-native models for the 43 entries, by the above sampling procedure, we carried out Monte Carlo sampling of the complex models by starting from the native structure to obtain the conformations around the native conformations, and we used these conformations as the near-native models. The Monte Carlo sampling was also performed so that each model had an optimized objective function and an L-rmsd smaller than 10.0 Å from the native complex. It should be noted that in the CAPRI experiments, the submitted models with an L-rmsd smaller than 10.0 Å from the correct answer are judged as the successful models.41,42 Then, all models were energy minimized by the myPresto program.43 In one entry, no model was successfully minimized due to many clashes. Therefore, we decided to exclude this entry from the dataset. Consequently, both near-native models and decoy models could be prepared for 121 heterodimer entries. The total numbers of the near-native and the decoy models in the 121 heterodimer entries are 404 and 60,238, respectively. The optimized objective function39 was used as an indicator of the quality of a complex model concerning the area and the shape complementarity in the contact region. Since the magnitude of the objective function differs entry by entry, the ratio of the optimized objective function of the model to that of the native complex, called the “relative docking-score”, was considered, where the objective functions of the native complexes were calculated in the same way as that in the sampling method. The relative docking-score = 1.0 means that the complex model had an interface as good and large as that in the native complex. In this study, we defined the complex models with an L-rmsd smaller than 10.0 Å from the native complex and a relative docking-score higher than 0.95 as the near-native models.

Heterodimers used in developing scoring functions

The 121 heterodimer entries were divided into two groups: one contained 47 entries, and another contained 74 entries. In the former 47 entries, the complex model with the largest relative docking-score was the near-native model. On the other hand, in the latter 74 entries, the model with the largest relative docking-score was not the near-native model, and there were some “false positive models”, which we defined as the complex models with 10.0 Å or greater L-rmsds from the native complexes and with relative docking-scores higher than 0.95. For the latter 74 entries, scoring functions that evaluate the complex models by regarding factors other than the contact area should be required to select the correct solutions. We considered that the number of false positive models is related to the difficulty in the selection of the correct solutions, and that it may be advantageous to develop multiple scoring functions by using the latter cases. Thus, we examined the number of false positive models in the set of complex models for the 121 heterodimer entries. In the 47 heterodimer entries, no false positive model was obtained, as shown in Figure 1A, which provides an example of the relation between the L-rmsd and the relative docking-score of each complex model. The native complexes of these entries are entangled, as in a swapping dimer or a dimer with a loop wound around it. The regions corresponding to the entangled loops in the complex state are usually flexible or disordered in the monomer state, and these regions will be fixed or ordered when the complex is formed. In the bound–bound docking, these entries will not yield any false positive complex models due to their tangles. On the other hand, in the unbound–unbound docking it will be difficult to generate the near-native models due to their flexibility. This is because the monomeric structures of the protomers are used, which may have flexible loops or disorder regions. Thus, in these entries, the near-native models will be selected based only on their contact area without ranking of the complex models by the scoring functions. These 47 entries are listed in Supplementary Table 1.
Figure 1

A) An example of the heterodimers that do not need ranking of complex models to select the near-native models. The scatter plot shows the relation between the L-rmsd from the native complex and the relative docking-score in each model, in the heterodimer entry. As this plot shows, this entry, the heterodimer between chains B and F of 1or7 (RNA polymerase sigma-E factor and its negative regulatory protein),68 has no model with a 10.0 Å or greater L-rmsd and a higher relative docking-score than 0.95. B) An example of the heterodimers that need ranking of complex models for the selection of the near-native models. This heterodimer, chains A and B of 1ksh (arf-like protein 2 and 3′,5′-cyclic phosphodiesterase delta-subunit),69 has many models with large L-rmsds and high relative docking-scores.

The other 74 heterodimer entries have at least one false positive decoy, as shown in Figure 1B, where there are many false positive models with various L-rmsds. The native complexes of these entries have either convex and concave surfaces or almost flat surfaces in the interacting regions. These entries may require the evaluation functions other than the contact area to select the correct solutions, and therefore, they could be suitable for the development of scoring functions. Consequently, we decided to use these 74 heterodimer entries, listed in Supplementary Table 2, as the training entries to construct the scoring functions. For each of these heterodimers, 491.8 complex models, including 4.4 near-native models, were obtained on average.

Test datasets

The CAPRI targets

For the CAPRI targets T12, T18, T21, T25, and T26, the complex models were generated from two initial structures in unbound–bound forms (targets T12, T18, and T25) or in unbound–unbound forms (targets T21 and T26) by using our sampling method, where the ET scores were included in the objective functions.39 We used the models with an L-rmsd smaller than 10.0 Å from the native complexes and with any relative docking-scores as near-native models (summarized in Table 1). It should be noted that we did not set the threshold of the relative docking score in the determination of the near-native models for the test datasets. This is because the structures of the component protomers of the test targets and those of the corresponding native complexes were determined under different crystallization conditions, and thus a comparison of the scores of the complex models for the test targets with those of the native complexes is not significant.
Table 1

The test dataset: the CAPRI targets

Target (PDB ID)aComponent proteinsbNear-nativecDecoydHighest rankeScoring functionfCharacters of the native interface
T12 (1ohzg)Cellulosomal scaffolding cohesin/dockerin xylanase domain1298fc3Almost flat, Hydrophobic and electrostatic complementary interface
T18 (−)Endo-1, 4-B-xylanase/its inhibitor TAXI1291fc4Highly concave and convex, highly shape and no hydrophobic complementary interface
T21 (1zhih)Origin recognition complex subunit1/regulatory protein Sir11292, 1fc1, fc2Nonglobular complex, Highly electrostatic and modestly shape complementary interface
T25 (2j59i)ADP-ribosylation factor1/Rho GTPase-activating protein 10 ARF-binding domain1297103fc3Almost flat, Hydrophobic and shape complementary interface
T26 (2hqsj)Peptidoglycan-associated lipoprotein/tolb71127fc3Concave and convex, Hydrophobic and shape complementary interface

Notes:

The target identity and the PDB ID of the native heterodimer complex. The PDB ID of T18 is unknown.

Information for the component proteins.

The number of near-native models used in the test.

The number of decoy models used in the test.

The highest rank of the near-native model.

The scoring function that made the highest rank of the near-native model.

Carvalho et al.62

Hou et al.63

Menetrey et al.64

Bonsor et al.65

The unbound–unbound pairs of the heterodimer entries

Six heterodimers, which have the monomeric structures of the two component protomers stored in the PDB, were found in the training dataset. We performed the unbound–unbound docking from the monomeric structures of these entries by our sampling method without ET scores so that up to 500 complex models were generated for each entry. Four entries were available for this test because the other two entries yielded no model with an L-rmsd smaller than 10.0 Å from the native complexes due to the conformational changes of the loop structures involved in the protein–protein interaction. All four of the entries had 10 or more models with L-rmsds smaller than 5.0 Å. Therefore, we chose 10 models with the largest values of the optimized objective functions among the complex models with L-rmsds smaller than 5.0 Å for each target as the near-native models. The other models with L-rmsds smaller than 10.0 Å were not used in this test. The information for these entries is summarized in Table 2.
Table 2

The test dataset: the unbound–unbound pairs of the four heterodimer entries

TargetaComponent proteinsbNear-nativecDecoydHighest rankeScoring functionfCharacters of the native interface
1bvng1hx0.A (alpha-amylase)/1ok0.A (its inhibitor)103093fc4Modestly shape, and no hydrophobic and electrostatic complementary interface
1ewyh2bmw.A (ferredoxin-NADP reductase)/1czp.A (ferredoxin I)1043431fc4Large concave and convex, Modestly electrostatic and shape complementary interface
1p2ji1hj9.A (beta-trypsin)/5pti.A (its inhibitor)104903fc2Small concave and convex, Electrostatic and shape complementary interface
1uugj3eug.A (uracil-DNA glycosylase)/1ugi.A (its inhibitor)1046910, 4fc1, fc2Large concave and convex, Highly complementary for three surface features

Notes:

The PDB ID of the native complex of the training heterodimer entry.

PDB IDs and chain IDs of the monomeric structures of the component proteins and their information.

The number of near-native models used in the test.

The number of decoy models used in the test.

The highest rank of the near-native model.

The scoring function that made the highest rank of the near-native model.

Wiegand et al.66

Morales et al.61

Helland et al.67

Putnam et al.59

Scoring function

A scoring function was defined as a linear combination of weighted complementarity scores for the hydrophobicity, the electrostatic potential, and the shape on the molecular surfaces of the protein–protein interface. The basis of the complementarity calculation was originally developed for the classification and analyses of homo-oligomer interfaces in our previous study.35 First, a Connolly surface44 consisting of triangular polygons was constructed for each protomer. Next, the hydrophobicity calculated by the Ooi–Oobatake method,45 and the electrostatic potential obtained by solving the Poisson–Boltzmann equation numerically with the SCB program46 were mapped onto each vertex on the Connolly surface. The shape of the surface was also considered using the average curvatures at each vertex.47 The interacting region on the surfaces was defined as a set of pairs of vertices from different surfaces with a distance shorter than 3.0 Å. Then, the complementarity scores, H, E, and S for the hydrophobicity, the electrostatic potential and the shape, respectively, were defined as the ratio of the number of complementary vertex-pairs for the hydrophobicity (N, hydrophobic and hydrophobic), the electrostatic potential (N, opposite sign of the potential) or the shape (N, convex and concave), respectively, to the total number of vertex-pairs existing in the interface, N, as follows: H = N/N, E = N / N and S = N / N. It should be noted that we used the two indices of the shape complementarity of the interfaces in this study. One is the shape complementarity calculated by the objective function in the sampling step, which is used to choose complex models that have no or few crashes, moderately large areas, and almost continuous interfaces, and to eliminate poor models. Another is the S that represents the degree of the shape complementarity against the interface, which is used to compare the different complex models in terms of the shape complementarities of the interfaces. The parameters to define the complementary vertex-pairs for the three surface features were optimized in conjunction with changing the distance cut off in the definition of the interacting region, from 1.0 Å in the original study35 to 3.0 Å, so that the difference between the complementarity scores of the energy-minimized and nonenergy-minimized models was minimized. Since the optimization of the parameters was performed independently of this study, it will not be discussed further. Finally, the degree of complementarities, COMP, was defined as follows: where the weight parameters, W, W, and W, are normalized so that . The weight parameters were optimized by introducing the subparameters w1, w2, and w3, so that W = w1/W, W = w2/W and W = w3/W where to ensure the constraint of . The subparameters were changed from −100 to 100 at intervals of 1. Thus, 8, 120, 600 (= 2013 – 1) weight combinations, the combinations of w1, w2, and w3, were considered, where 1 is (w1, w2, w3) = (0, 0, 0). The values of W, W, and W ranged from −1.0 through 1.0, respectively.

Search for the successful weight combinations

The highly successful weight combinations in the selection of near-native models were searched among all of the possible weight combinations, to classify the heterodimers and then to construct the multiple scoring functions, as follows.

Conversion of the three-dimensional weight combinations into the two-dimensional space

The three-dimensional weight combinations were converted into the two-dimensional space of two angles, the zenith and azimuth angles in polar coordinates, where the radius = 1, the zenith angle was the angle between the W-axis and the line from the origin to the considered point, and the azimuth angle was that between the positive W-axis and the line from the origin to the considered point, projected onto the W–W plane. The two-dimensional space was separated into 162 grids at intervals of 20 degrees as shown in Figure 2. We considered two more grids, which correspond to (W, W, W) = (0, 0, 1) and (W, W, W) = (0, 0, −1), because when the zenith angle is 0 or 180, namely W = 1 or −1, respectively, the azimuth angle cannot be defined. It should be noted that this weighing scheme did not yield equal density of weight combinations. Therefore, these 164 grids contained different numbers of weight combinations, as shown in the third column of Supplementary Table 3.
Figure 2

The distribution of the grids with high fs in each cluster. The grids with fs higher than 5.0 in the entries belonging to each cluster are colored based on the color bar on the bottom-right corner, where “C1”, “C2”, “C3” and “C4” mean Clusters 1, 2, 3 and 4, respectively. The outside grids with (0, 0, 1) and (0, 0, −1) are those corresponding to (W, W, W) = (0, 0, 1) and (W,W,W) = (0, 0, −1), respectively. The Ws in the grids surrounded by black dotted-lines were defined as the multiple scoring functions, where the grids with f1, f2, f3 and f4 were selected from Clusters 1, 2, 3 and 4, respectively. The serial numbers of each grid for the zenith (θ ) and azimuth (φ ) angles, respectively, are also shown on the axes of the both angles, which are assigned at intervals of 20 degrees, respectively.

An occurrence frequency of the successful weight combinations in a grid

In each training entry, for each weight combination, the COMP values of all complex models were calculated, and the complex models were ranked in the order of the COMP values. Then, an occurrence frequency, f(θ, φ), of the weight-combinations that could rank the near-native models in the top 10 was calculated in each grid according to the following Equation, where N(θ, φ) was the number of weight combinations that could rank at least one near-native model in the top 10 in each grid, and N(θ, φ) was the number of all of the possible weight combinations belonging to each grid, which was shown in the third column of Supplementary Table 3. Because N(θ, φ) differs grid by grid as described above, N(θ, φ) was normalized by N(θ, φ) in Eq. 2, to avoid under- or overestimation in the calculation of the f. The “θ ” and “φ ” were the serial numbers of each grid for the zenith and azimuth angles, respectively, and they were assigned at intervals of 20 degrees on the axes of the both angles as shown in Figure 2. It should be noted that a prediction is generally regarded as “acceptably” successful, when the correct solutions are ranked within the top 10. This criterion is also adopted in the CAPRI experiment.41,42 (N/N) was set to correct the differences in the degrees of difficulty in ranking the near-native models in the top 10 between different entries. N was the summation of the N(θ, φ)s in all grids. N was the summation of the N(θ, φ)s in all grids, namely N = 8,120,600. If (N/N) is 1, then all weight-combinations can rank the near-native models in the top 10. When (N/N) is considerably smaller than 1, only a few weight combinations can rank the near-native models highly. This indicates that the selection of the near-native models in the latter case is more difficult than that in the former case. The high f(θ, φ) indicates that the weight combinations existing in the grid have high possibilities of success in the selection of near-native models.

Results and discussion

Classification of the heterodimer entries

We first tried to classify the 74 heterodimers to construct the multiple scoring functions that select the near-native models from many decoy models, as summarized in the flowchart in Supplementary Figure 1 where the whole procedures for constructing the multiple scoring functions are shown. The classification was performed based on the discriminative characters between near-native models and decoy models, which were found in the calculation of the f(θ, φ) for each grid in each entry, as follows. As shown in the seventh column of Supplementary Table 3, the numbers of entries with N(θ, φ) larger than 0 were very diverse. It suggests that there are no major grids in which the weight parameters can succeed in selecting near-native models in many entries, and therefore, the classification will be required. Thus, the 74 training heterodimer entries were classified based on the f(θ, φ)s in all 164 grids in each entry, by the clustering method of program R,48 where the Euclidean distances between the 164-dimensional vectors of the f(θ, φ)s were used as the distances between entries. The distances between the clusters were then calculated by Ward’s method. This clustering method divided the 74 training heterodimer entries into two groups clearly, where one group was also separated into two clear clusters, but another was not divided. We investigated the grids where the entries belonging to each group had f higher than 5.0, and found that the separation in the former group related to the grids with high f, as shown in Figure 2. We also found that the latter group might be separated into two clusters in the same manner as that in the former group. Therefore, we decided to classify the heterodimers into four clusters, Clusters 1 and 2 from one large group, and Clusters 3 and 4 from another large group, each containing 15, 12, 9, and 9 entries, respectively. It should be noted that we tried 1.0, 2.5, 5.0, and 7.5 as the f criterion to define the distribution of the grids. When either 1.0 or 2.5 was used as the criterion, the difference between the distributions in the two groups was unclear. On the other hand, some entries had no f(θ, φ) higher than or equal to 7.5. Therefore, we used 5.0 as the criterion. The grids where at least one entry belonging to a cluster had the f(θ, φ) higher than 5.0 were regarded as the “grids belonging to the cluster”, which were colored according to the color bar in Figure 2. Note that the grids could belong to two or more different clusters. The other 29 entries could not be classified as any clusters, because no weight-combination could rank the near-native models in the top 10, namely the f(θ, φ)s in all grids were 0. Our method succeeded in the selection of the near-native models in 45 entries (60.8% = 45/74), as described above. To investigate the performance of our method, we examined the performance of ZDOCK12 in the bound–bound docking for the 74 training heterodimers. ZDOCK could include at least one complex model with the L-rmsd smaller than 10 Å from the native complex in the best 10 models, in 62 entries (83.8% = 62/74). Because our criterion for a successful prediction is that at least one complex model with the L-rmsd smaller than 10 Å from the native complex and with the relative docking-score larger than 0.95 is ranked in the top 10, we calculated the relative docking-scores of the best 10 complex models generated by ZDOCK. We also tried 0.90 and 0.85 as the thresholds of the relative docking-score, because 0.95 might be a severe threshold for ZDOCK models which were not optimized for the objective functions by our sampling method. As the result, in 43 entries (58.1% = 43/74), at least one complex model could meet our criterion. For 0.90 and 0.85 thresholds, 52 (70.3% = 52/74) and 56 (75.7% = 56/74) entries could meet the criteria, respectively. Thus, the performance of our method was not very low, compared to that of ZDOCK in the bound-bound docking for our training dataset. All of the grids with high f(θ, φ)s in Cluster 1 had positive weights for the shape of the interface. This indicates that the shape complementarity was the most effective contributor in ranking the near-native models in the top 10. In other words, the shape complementarity was the “discriminator” of the near-native models from the other decoys. The discriminators in Clusters 2 and 3 were the complementarities for the electrostatic potential and the hydrophobicity, respectively. In Cluster 4, the weight of the shape contribution was positive; however, the weight of the hydrophobicity was negative. The information about these clusters is summarized in Table 3.
Table 3

Discriminator of the scoring function and characteristics of the native complexes in each cluster

ClusteraDiscriminatorb
Native charactersc
HydrophobicElectrostaticShape
1+++Modestly globular complex. Highly shape and electrostatic complementary interface
2++Nonglobular complex. Highly electrostatic complementary interface
3+Almost globular complex. Hydrophobic complementary interface
4+Shape complementary but no hydrophobic complementary interface

Notes:

The cluster identity.

The discriminator in each cluster. The terms “hydrophobic”, “electrostatic” and “shape” mean the complementarities for the hydrophobicity, the electrostatic potential and the shape, respectively. The “+” means that the corresponding weight had a positive effect on the selection of the near-native models. On the other hand, the “−” means that the weight did not contribute to the selection. The weight with “++” contributes significantly to the selection.

The characters of the native complexes of the entries classified as the cluster.

Construction of the multiple scoring functions

Based on the classification results, the multiple scoring functions were constructed so that each function was applicable to the selection of the near-native models in the heterodimer entries belonging to each cluster, as follows. First, we considered the respective averages of the three weight values corresponding to all weight-combinations belonging to each grid, as a representative weight-combination in each grid, which we designated as W. Then, the near-native models were again selected by using the 164 Ws for the training entries. Finally, four Ws, each of which was a W in a grid belonging to each cluster, were chosen so that the total number of successful entries in the selections by the four Ws was maximized. Since there were cases where the near-native models in an entry could be ranked in the top 10 by two or more Ws belonging to different clusters, the total number of successful entries by the four Ws was counted as follows, to avoid overlaps in counting: the number of successful entries by a W from Cluster 1 was counted, and then, among the failed entries by the W from Cluster 1, the number of successful entries by a W from Cluster 2 was counted. This procedure was iterated up to Cluster 4. The number of successful entries was counted for all of the possible combinations of the four Ws from the four clusters. Consequently, we selected the four Ws, with grids surrounded by the dotted-lines in Figure 2, as the multiple scoring functions, and designated them as f1, f2, f3 and f4, from Clusters 1, 2, 3 and 4, respectively. The real weight values of the four Ws are f1: (W, W, W) = (0.34, 0.40, 0.84), f2: (−0.27, 0.71, −0.64), f3: (0.74, 0.13, −0.64), and f4: (−0.52, −0.10, 0.84), respectively. The total number of successful entries by the four Ws was 33 (73.3% = 33/45), where 45 was the number of entries where the near-native models could be selected by any Ws.

Classification results of heterodimers in the training dataset

The heterodimers in our training dataset were classified based on the occurrence frequencies of the weight-combinations that could select the near-native models, as described above. Next, we tried to find the common characteristics in each cluster, and to investigate whether the classification results were related to the biological functions. To find the characteristics of the heterodimers, we examined the native complexes of the heterodimer entries from the aspects of the whole complex structures and the interface shapes by assessing them visually,49 and the aspect of the interaction modes by checking the complementarity scores for the hydrophobicity, the electrostatic potential, and the shape at the interfaces, designated as H, E, and S, respectively. The common characteristics of the native complexes in each cluster are summarized in Table 3.

Common characteristics in Cluster 1

In 11 entries among the 15 entries belonging to Cluster 1, the interfaces of the native complexes have higher Ss than the average of the Ss in the 74 training entries (0.36). The Ss in the other four entries are lower than the average, but are not very small (1m2t: 0.33, 1o6s: 0.34, 1sq2: 0.34, and 1t6g: 0.34). The overall structures of these 15 entries are modestly “globular”. Eight of them also have higher Es than the average of the Es (0.38). The entry in Figure 3A: the heterodimer of lysozyme C and antigen receptor V domain (1sq2),50 which has a lower S (0.34) than the average, shows that the proteins interact with each other by placing concave surfaces on convex surfaces. This suggests that shape complementarity is the dominant characteristic in this cluster. It corresponds to the discriminator in this cluster, namely the character of f1.
Figure 3

The characters of the native complexes of the heterodimer entries belonging to each cluster. For an entry in each cluster, the whole complex structure, the interface region colored purple, and the electrostatic potential mapped on the surfaces, where the negative and positive electrostatic potentials are colored red and blue, respectively, of the native complex are shown. The middle and left figures are shown in open-book view. A) An example in Cluster 1 (chains L and N of 1sq2). B) An example in Cluster 2 (chains A and B of 1hx1). C) An example in Cluster 3 (chains A and B of 1rj9). D) An example in Cluster 4 (chains A and B of 1jql). E) An example of the failed entries in the selection of near-native models (chains A and B of 1tej).

Common characteristics in Cluster 2

Among the 12 entries in Cluster 2, 11 interfaces of the native complexes have higher Es than the average (0.38), and their overall complex structures are “nonglobular”, as shown in Figure 3B: the heterodimer of the chaperone ATPase domain and the BAG chaperone regulator (PDB ID 1hx0, E: 0.61),51 where the electrostatically positive surfaces, colored blue, tightly interact with the electrostatically negative surfaces, colored red. Thus, the characteristic surface feature of the native interfaces in Cluster 2 could be the electrostatic complementarity, and it corresponds to the character of f2. The last entry, 1fxw, has lower complementarity scores for three surface features (H: 0.10, E: 0.28, and S: 0.29) than the averages in the 74 training entries (0.16, 0.38, and 0.36), respectively. No significant characteristics were found for this example.

Common characteristics in Cluster 3

In seven of the nine entries classified in Cluster 3, the interfaces of the native complexes have higher Hs than the average (0.16). The interface shapes are either low convex and concave or almost flat. The whole complex structures are more “globular” than those in Clusters 1 and 2. The heterodimer of the GTPase domain of a signal recognition particle and its receptor (1rj9),52 shown in Figure 3C, has an almost flat interface and a higher H (0.26) than the average, and resembles a homodimer interface. Thus, the characteristic surface feature of the native interfaces in this cluster could show the hydrophobic complementarity, which corresponds to the character of f. In the remaining two entries, one entry, 1clv, has lower complementarity scores for three surface features (H: 0.11, E: 0.13, and S: 0.34), and the other entry, 1uzx, has lower H and S (H: 0.12, E: 0.55, and S: 0.29) than the averages (0.16, 0.38, and 0.36). Since the interface of the former entry consists of relatively highly convex and concave surfaces, this interface is considered to be similar to those of the entries belonging to Cluster 1. We could not understand why the latter case differed.

Common characteristics in Cluster 4

For all nine of the entries belonging to Cluster 4, the near-native models could be ranked in the top 10 by fewer weight-combinations than those in the other clusters. This indicates that the selection of the near-native models in this cluster was more difficult than those in the other clusters. The native interface of one entry has a steep shape, made of one loop structure, and those of the other five entries have smooth shapes, as shown in Figure 3D: the heterodimer of DNA polymerase III beta and delta chains (1jql).53 In the other three entries, the native complexes have a few water molecules at the interacting regions. No characteristic of the native complexes was found in this cluster, and the features of these entries were similar to those of the entries that failed in the selection of the near-native models, described in the next section.

Failed entries in selecting the near-native models

Among the 74 training heterodimers, 29 were not classified as any clusters because no near-native model could be ranked in the top 10 by any weight combination. The native complexes in six entries have steep shapes at the interfaces and those in 17 other entries have smooth shapes or almost flat interfaces. In these 23 (= 6 + 17) entries, the protomers of the dimers could bind tightly with each other at different surface regions from the native interfaces, thus generating many decoy models with high complementarity scores, as shown in Figure 3E: a disintegrin heterodimer (1tej).54 In the other six entries, the native complexes have water or ligand molecules in the interacting regions. These native interfaces have lower complementarity scores than those expected. This is because the protein–water and protein–ligand interactions were not considered in the calculation of the complementarities. Thus, the complementarity scores of the near-native models were also lower than those of other decoy models. The many decoy models with high complementarity scores in the former 23 entries, and the low complementarity scores of the near-native models in the latter six entries made the correct selection difficult. Further optimization of the parameters or the introduction of other parameters in the calculation of interface complementarities might be required for these cases.

Biological functions of the heterodimers in each cluster

Among the 74 training entries, 19 enzyme-inhibitor complexes were included, as marked in Supplementary Table 2. We examined the clusters to which these enzyme-inhibitor complexes belonged, in order to investigate whether the classification results were related to the biological function. Twelve complexes were classified into four different clusters; five, two, two, and three entries belonging to Clusters 1, 2, 3, and 4, respectively. The other seven entries were not classified into any clusters because they failed in selecting the near-native models. In 14 of the 19 enzyme-inhibitor complexes, the native interfaces are formed through the interaction between the concave and electrostatically negative surface of the enzyme and the convex and electrostatically positive surface of the inhibitor, as shown in Figure 4B: the heterodimer of alkaline metalloproteinase and its inhibitor (1jiw),55 Figure 4C: alphaamylase and its inhibitor (1clv),56 and Figure 4D: endo–1, 4-beta-xylanase and its inhibitor (1ta3).57 However, as these examples show, they have diverse depths and sizes of cavities and different ratios of molecular sizes between the enzyme and the inhibitor proteins. The other four enzyme-inhibitor complexes have both electrostatically positive and negative surfaces on each side of the interfaces, as shown in Figure 4A: the heterodimer of the TEM-1 beta-lactamase and its inhibitor protein II (1jtd).58 In the remaining entry, 1uug, the heterodimer of uracil-DNA glycosylase and its inhibitor,59 which was not classified in any cluster and is shown in Figure 4E, the interface on the enzyme side is electrostatically positive, and that on the inhibitor is electrostatically negative. These observations indicate that the heterodimers with the same protein functions can have the different discriminative characters between the near-native and the decoy models, and also have the different dominant characters in their native interfaces.
Figure 4

The characters of the native complexes of the enzyme-inhibitor type heterodimers. For the enzyme-inhibitor dimer classified as each cluster, the whole complex structure, the interface region colored purple, and the electrostatic potential mapped on the surfaces, where the negative and the positive electrostatic potentials are colored red and blue, respectively, of the native complex are shown. The middle and left figures are shown in open-book view. A) An example in Cluster 1 (chains A and B of 1jtd). B) An example in Cluster 2 (chains I and P of 1jiw). C) An example in Cluster 3 (chains A and I of 1clv). D) An example in Cluster 4 (chains A and B of 1ta3). E) An example of the failed entries in the selection of near-native models (chains C and D of 1uug).

It is widely accepted that transient and permanent complexes differ in terms of the type of interactions: the former complexes are often formed through salt bridges and hydrogen bonds, while the latter are formed through hydrophobic interactions.2 Since the identification of transient complexes is difficult, we tried to find stable heterodimers by checking the primary citations of the native complexes of the training heterodimer entries, and also to find transient heterodimers by referring to the list of transient heterodimers by Nooren and Thornton.60 We found 13 stable heterodimers and eight transient heterodimers. Among the latter transient heterodimers, five entries were included in their list, and the other three entries contained the domains with the same SCOP family identities38 as those of the listed heterodimers. Both the stable and transient heterodimers were also classified as different clusters, as shown in Supplementary Table 2. It suggests that the discriminative interface characters are not common in transient complexes and in stable complexes, respectively, and moreover, there are no clear differences between the discriminative characters of transient complexes and those of stable complexes. Thus, the clusters based on the discriminative interface characters between the near-native and the decoy models were independent from the types of biological functions of the heterodimers, and they were only related to the dominant characters of the native heterodimer interfaces.

Scoring tests for unbound docking models

The multiple scoring functions were tested in the selection of the correct solutions from complex models, which were generated from the monomeric structures of component proteins of heterodimers. Two datasets were tested: one is the set of five CAPRI targets,8,21 T12, T18, T21, T25, and T26, and the other is the set of four pairs of the monomeric structures for the four training heterodimer entries. For each target, both near-native and decoy models were generated by our sampling method from the monomeric structures in the unbound-bound form (T12, T18, and T25), and in the unbound–unbound form (T21, T26, and the four training entries). Note that the complex models for the CAPRI targets were generated by considering the sequence conservations by the ET method, as described in Materials and methods. Because we narrowed the search of complex models according to the result of the ET, we could not obtain a large number of models. Thus, the numbers of complex models in these targets were small and diverse. Although we did not calculate the number of false positive models for each target because the relative-docking scores could not be estimated for unbound docking models as described before, the difficulty of the selection of the near-native models may differ target by target. In the scoring test, the rankings of the complex models were performed by each of the four scoring functions, and the prediction was considered to be successful when at least one near-native model could be ranked in the top three by at least one scoring function. As a result, in two out of the five CAPRI targets and the two monomer pairs of the heterodimer entries, at least one scoring function could rank the near-native models within the top three. In the other three targets, the near-native models were ranked within the top 10. The characteristic surface features of the native interfaces also corresponded to the characters of the successful scoring functions in these seven targets, as summarized in Tables 1 and 2 and shown in Supplementary Figure 2. In the CAPRI target T25 and the monomer pair of 1ewy,61 no scoring function could rank the near-native models in the top 10. The highest ranks of the near-native models were 103 by f in T25 and 31 by f in 1ewy. The native complex of T25 has a hydrophobic interface with a complementary shape, and that of 1ewy has a modestly electrostatic and shape complementary interface. These features suggest that f for T25 and f or f for 1ewy are appropriate for selections of the near-native models. Thus, the characters of the scoring functions that made the highest ranks, also corresponded to the characteristic features of the native interfaces in these two entries.

Conclusion

In this study, we constructed the multiple scoring functions based on the classification of the diverse heterodimers. In the four clusters found in this study, Cluster 1 contained the largest number of entries (15 entries); however, there were few differences between the number of entries in Cluster 1 and those in the other clusters, 12, 9, and 9 in Clusters 2, 3, and 4, respectively. In other words, based on our classification scheme no major cluster with a dominant interaction mode was found. Therefore, we think that the multiple scoring functions constructed according to our classification scheme may have a better potential for selecting the near-native models of heterodimers than a single scoring function. In an actual prediction, the selection of one scoring function appropriate for a given pair of protomers may be required. We consider that one possible approach to the selection is as follows; the COMP values of all complex models are calculated by each of the four scoring functions, and then, the Z-scores are estimated from the COMP values. The scoring function with the best Z-score can be the most appropriate scoring function. This approach succeeded in ranking the near-native models in the top 10 in two CAPRI targets (T21 and T26) and one monomer pair (1bvn) of our test datasets. Heterodimer entries not used to construct the scoring functions Notes: The entry with 1h9h, failed in the energy minimization. The number of residues in chain 1. The number of residues in chain 2. 74 training heterodimer entries Notes: The number of residues in chain 1. The number of residues in chain 2. The cluster in which the entry was classified. “C1”, “C2”, “C3” and “C4” mean Clusters 1, 2, 3 and 4, respectively. The “−” means that the entry failed in the selection of near–native models. The entries with the signs “e”, “t” and “s” were discussed in terms of their biological functions in the text. The “e” means that the entry is an enzyme–inhibitor type complex. The “s” means that the entry is considered as a stable complex. The “t” means that the entry is considered as a transient complex by Nooren and Thornton.1 The number of near–native models. The number of false positive models. Data for 164 grids Notes: The serial number of the grid on the axis of the zenith angle (θ). The serial number of the grid on the axis of the azimuth angle (φ). Grid(10, 0) is correspondent to the grid with θ = 0, namely, (W, W, W) = (0, 0, 1). Grid(10, 1) is correspondent to the grid with θ = 180, namely, (W, W, W) = (0, 0, −1). The number of weight-combinations belonging to the grid. The averaged weight value for the hydrophobicity in the grid. The averaged weight value for the electrostatic potential in the grid. The averaged weight value for the shape in the grid. The number of entries with N > 0, where N is the number of weight-combination which could rank the near-native models in the top 10. The flowchart of the procedures for constructing the multiple scoring functions. Abbreviations: NN, near-native model; FP, false postive model. The characters of the native complexes of the targets in the scoring test datasets. For the native complex for each target used in the scoring tests, the whole complex structure, the interface region colored purple, and the electrostatic potential mapped on the surfaces, where the negative and the positive electrostatic potentials are colored red and blue, respectively, are shown. The middle and left figures are shown in open-book view. A) The CAPRI targets. Figures for the native complexes of targets T12 (1ohz), T18, T21 (1zhi), T25 (2j59) and T26 (2hqs) are shown, beginning at the top. B) The unbound–unbound pairs of four heterodimer entries. Figures for the native complexes of 1bvn, 1ewy and 1p2j are shown, beginning at the top.
Table S1

Heterodimer entries not used to construct the scoring functions

PDBIDaChain 1Chain 2Residue 1bResidue 2c
1b0nAB11157
1dceCD567331
1devCD19641
1e44BA9685
1eucBA396311
1euvAB21186
1f2tAB149148
1f34AB326149
1f3uFE171118
1f60AB45894
1fs0GE230138
1g8kFE825133
1gk9BA557260
1go3MN187107
1gzsAB180165
1h0hAB977214
1h1rAB303258
1h9h*EI22336
1hfeMT421123
1i2mBA402216
1iznAB286277
1jdhAB52938
1jkgBA250140
1jltAB122122
1ka9FH252200
1kfuLS699184
1ld8BA437382
1lp1AB5858
1m1eAB53881
1mtpAB32343
1mu2AB555426
1n1jBA9793
1nf3AC195128
1o94DC320264
1o97DC320264
1oo0AB147110
1or7BF19490
1p5vAB235147
1q7lAB19888
1r8oAB9671
1rp3GH23988
1s9dEA203164
1tqyGH424415
1ubkLS534267
1ugpBA226203
1vetBA125124
1vf6BD8372
6reqCD727637

Notes:

The entry with

1h9h, failed in the energy minimization.

The number of residues in chain 1.

The number of residues in chain 2.

Table S2

74 training heterodimer entries

PDBIDChain 1Chain 2Residue 1aResidue 2bClustercFunctiondNNeFPf
1b2sAD11090C4e820
1bvnPT49674C4e73
1c1yAB16777C2t1300
1clvAI47132C3e48
1ct4EI18551C3e4144
1cxzAB18286C1t3216
1d2zDC153108C133
1d4xAG375126C324
1dj7AB11775C4s414
1dtdAB30361e4101
1e96BA203192C2t4157
1ewyAC3039810238
1f3vAB179171281
1f7zAI23365e396
1fm0ED15081C4s36
1fr2BA13486C2236
1fxwAF232229C2s34
1fyhAB258229C44109
1gl1CK24536e626
1gl4AB28598533
1h32AB261138C1s22
1he1CA176135t318
1hx1AB400114C25105
1ibrDC462216t411
1irdBA146141s38
1j2jAB16645466
1jatAB155138C2535
1jiwPI470106C2e615
1jqlAB366140C4547
1jtdBA273263C1e947
1jtgAB263165e63
1kd8EF3636C346
1ki1BA352188533
1kliHL25469s23
1kpsDC1711593128
1kshAB186152C4684
1kxqBG496120C2e721
1kz7AB353188t33
1l4dAB2491224165
1lshAB1056319C3s5135
1lw6EI28164C1e29
1m2tBA263254C1s521
1m9xBC165146C15121
1mbxAC1421063159
1mqkHL127120s42
1nf5DC286123s6302
1nrjBA218158C3335
1nw9BA27798619
1o5eHL255114C1s33
1o6sAB466105C1215
1oc0AB37951C1e2184
1ow3AB242193C3t314
1p2jAI22358C1e329
1qavBA11590s134
1rj9AB304300C3433
1shwBA1811388154
1sq2LN129113C1348
1sv0AC8582C2588
1svxBA395169C16271
1t6bYX735189C41078
1t6gAC381184C1e945
1ta3BA303274C4e3155
1te1AB274190e6136
1tejAB6464s5174
1tmqAB471117e616
1tueLK218212C26164
1u0sYA11886C1t416
1ukvGY45320633
1usuAB260170C23192
1uugCD22984e749
1uw4DC24891226
1uzxAB16976C35122
1v74AB10787C2539
3fapAB107948315

Notes:

The number of residues in chain 1.

The number of residues in chain 2.

The cluster in which the entry was classified. “C1”, “C2”, “C3” and “C4” mean Clusters 1, 2, 3 and 4, respectively. The “−” means that the entry failed in the selection of near–native models.

The entries with the signs “e”, “t” and “s” were discussed in terms of their biological functions in the text. The “e” means that the entry is an enzyme–inhibitor type complex. The “s” means that the entry is considered as a stable complex. The “t” means that the entry is considered as a transient complex by Nooren and Thornton.1

The number of near–native models.

The number of false positive models.

Table S3

Data for 164 grids

θaφbNgrid _ possiblecWhdWeeWsfNgrid_entry > 0g
116,985−0.24−0.050.979
127,750−0.20−0.120.978
137,806−0.15−0.180.977
147,832−0.08−0.220.976
157,8290.00−0.230.976
167,8320.08−0.220.976
177,8060.15−0.180.977
187,7500.20−0.120.978
196,9850.24−0.050.9711
1108,7760.220.040.9714
1117,7500.200.120.9716
1127,8060.150.180.9720
1137,8320.080.220.9717
1147,8290.000.230.9718
1157,832−0.080.220.9717
1167,806−0.150.180.9716
1177,750−0.200.120.9715
1188,776−0.220.040.9712
2132,634−0.52−0.100.8410
2233,681−0.45−0.260.845
2333,747−0.34−0.400.842
2433,819−0.18−0.490.841
2533,6470.00−0.530.841
2633,8190.18−0.490.841
2733,7470.34−0.400.843
2833,6810.45−0.260.845
2932,6340.52−0.100.847
21035,0310.520.090.8416
21133,6810.450.260.8419
21233,7470.340.400.8419
21333,8190.180.490.8418
21433,6470.000.530.8419
21533,819−0.180.490.8419
21633,747−0.340.400.8419
21733,681−0.450.260.8417
21835,031−0.520.090.8416
3169,113−0.74−0.140.649
3290,640−0.66−0.390.631
33106,400−0.50−0.590.620
3477,866−0.27−0.710.641
3568,0540.00−0.750.651
3677,8660.27−0.710.642
37106,4000.50−0.590.622
3890,6400.66−0.390.634
3969,1130.74−0.140.649
31072,0590.740.130.6415
31190,6400.660.390.6317
312106,4000.500.590.6220
31377,8660.270.710.6421
31468,0540.000.750.6521
31577,866−0.270.710.6419
316106,400−0.500.590.6222
31790,640−0.660.390.6319
31872,059−0.740.130.6417
4149,496−0.91−0.170.3510
4275,588−0.80−0.480.352
43102,916−0.61−0.700.351
4458,480−0.33−0.870.350
4548,0370.00−0.930.351
4658,4800.33−0.870.352
47102,9160.61−0.700.352
4875,5880.80−0.480.353
4949,4960.91−0.170.3510
41051,5230.910.160.3514
41175,5880.800.480.3519
412102,9160.610.700.3518
41358,4800.330.870.3519
41448,0370.000.930.3519
41558,480−0.330.870.3517
416102,916−0.610.700.3520
41775,588−0.800.480.3517
41851,523−0.910.160.3515
5143,535−0.97−0.180.0010
5266,479−0.85−0.510.001
5390,408−0.65−0.750.001
5451,433−0.35−0.930.000
5542,2440.00−0.990.000
5651,4330.35−0.930.002
5790,4080.65−0.750.002
5866,4790.85−0.510.003
5943,5350.97−0.180.007
51045,3130.970.170.0014
51166,4790.850.510.0019
51290,4080.650.750.0017
51351,4330.350.930.0015
51442,2440.000.990.0015
51551,433−0.350.930.0019
51690,408−0.650.750.0018
51766,479−0.850.510.0017
51845,313−0.970.170.0013
6149,496−0.91−0.17−0.357
6275,588−0.80−0.48−0.351
63102,916−0.61−0.70−0.350
6458,480−0.33−0.87−0.350
6548,0370.00−0.93−0.350
6658,4800.33−0.87−0.351
67102,9160.61−0.70−0.351
6875,5880.80−0.48−0.354
6949,4960.91−0.17−0.356
61051,5230.910.16−0.3512
61175,5880.800.48−0.3513
612102,9160.610.70−0.3512
61358,4800.330.87−0.3514
61448,0370.000.93−0.3514
61558,480−0.330.87−0.3518
616102,916−0.610.70−0.3517
61775,588−0.800.48−0.3513
61851,523−0.910.16−0.3510
7169,113−0.74−0.14−0.642
7290,640−0.66−0.39−0.630
73106,400−0.50−0.59−0.620
7477,866−0.27−0.71−0.640
7568,0540.00−0.75−0.650
7677,8660.27−0.71−0.641
77106,4000.50−0.59−0.621
7890,6400.66−0.39−0.632
7969,1130.74−0.14−0.644
71072,0590.740.13−0.646
71190,6400.660.39−0.6311
712106,4000.500.59−0.6211
71377,8660.270.71−0.6411
71468,0540.000.75−0.6512
71577,866−0.270.71−0.6412
716106,400−0.500.59−0.6212
71790,640−0.660.39−0.6312
71872,059−0.740.13−0.648
8132,634−0.52−0.10−0.842
8233,681−0.45−0.26−0.840
8333,747−0.34−0.40−0.840
8433,819−0.18−0.49−0.840
8533,6470.00−0.53−0.840
8633,8190.18−0.49−0.840
8733,7470.34−0.40−0.840
8833,6810.45−0.26−0.841
8932,6340.52−0.10−0.842
81035,0310.520.09−0.844
81133,6810.450.26−0.845
81233,7470.340.40−0.848
81333,8190.180.49−0.8411
81433,6470.000.53−0.8411
81533,819−0.180.49−0.8412
81633,747−0.340.40−0.8413
81733,681−0.450.26−0.8411
81835,031−0.520.09−0.845
916,985−0.24−0.05−0.972
927,750−0.20−0.12−0.970
937,806−0.15−0.18−0.970
947,832−0.08−0.22−0.970
957,8290.00−0.23−0.970
967,8320.08−0.22−0.970
977,8060.15−0.18−0.970
987,7500.20−0.12−0.971
996,9850.24−0.05−0.971
9108,7760.220.04−0.971
9117,7500.200.12−0.973
9127,8060.150.18−0.972
9137,8320.080.22−0.974
9147,8290.000.23−0.974
9157,832−0.080.22−0.977
9167,806−0.150.18−0.976
9177,750−0.200.12−0.973
9188,776−0.220.04−0.972
1001000.000.001.000
1011000.000.00−1.000

Notes:

The serial number of the grid on the axis of the zenith angle (θ).

The serial number of the grid on the axis of the azimuth angle (φ). Grid(10, 0) is correspondent to the grid with θ = 0, namely, (W, W, W) = (0, 0, 1).

Grid(10, 1) is correspondent to the grid with θ = 180, namely, (W, W, W) = (0, 0, −1).

The number of weight-combinations belonging to the grid.

The averaged weight value for the hydrophobicity in the grid.

The averaged weight value for the electrostatic potential in the grid.

The averaged weight value for the shape in the grid.

The number of entries with N > 0, where N is the number of weight-combination which could rank the near-native models in the top 10.

  66 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  ZDOCK: an initial-stage protein-docking algorithm.

Authors:  Rong Chen; Li Li; Zhiping Weng
Journal:  Proteins       Date:  2003-07-01

3.  Structural characterisation and functional significance of transient protein-protein interactions.

Authors:  Irene M A Nooren; Janet M Thornton
Journal:  J Mol Biol       Date:  2003-01-31       Impact factor: 5.469

4.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations.

Authors:  Jeffrey J Gray; Stewart Moughon; Chu Wang; Ora Schueler-Furman; Brian Kuhlman; Carol A Rohl; David Baker
Journal:  J Mol Biol       Date:  2003-08-01       Impact factor: 5.469

5.  New algorithm to model protein-protein recognition based on surface complementarity. Applications to antibody-antigen docking.

Authors:  P H Walls; M J Sternberg
Journal:  J Mol Biol       Date:  1992-11-05       Impact factor: 5.469

6.  Assessing predictions of protein-protein interaction: the CAPRI experiment.

Authors:  Joël Janin
Journal:  Protein Sci       Date:  2005-02       Impact factor: 6.725

7.  Analyses of homo-oligomer interfaces of proteins from the complementarity of molecular surface, electrostatic potential and hydrophobicity.

Authors:  Yuko Tsuchiya; Kengo Kinoshita; Haruki Nakamura
Journal:  Protein Eng Des Sel       Date:  2006-07-12       Impact factor: 1.650

8.  Docking of protein molecular surfaces with evolutionary trace analysis.

Authors:  Eiji Kanamori; Yoichi Murakami; Yuko Tsuchiya; Daron M Standley; Haruki Nakamura; Kengo Kinoshita
Journal:  Proteins       Date:  2007-12-01

9.  The performance of ZDOCK and ZRANK in rounds 6-11 of CAPRI.

Authors:  Kevin Wiehe; Brian Pierce; Wei Wei Tong; Howook Hwang; Julian Mintseris; Zhiping Weng
Journal:  Proteins       Date:  2007-12-01

10.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

View more
  1 in total

1.  Dynamic features of homodimer interfaces calculated by normal-mode analysis.

Authors:  Yuko Tsuchiya; Kengo Kinoshita; Shigeru Endo; Hiroshi Wako
Journal:  Protein Sci       Date:  2012-09-17       Impact factor: 6.725

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.