| Literature DB >> 17662112 |
Guillaume Launay1, Raul Mendez, Shoshana Wodak, Thomas Simonson.
Abstract
BACKGROUND: In structural genomics, an important goal is the detection and classification of protein-protein interactions, given the structures of the interacting partners. We have developed empirical energy functions to identify native structures of protein-protein complexes among sets of decoy structures. To understand the role of amino acid diversity, we parameterized a series of functions, using a hierarchy of amino acid alphabets of increasing complexity, with 2, 3, 4, 6, and 20 amino acid groups. Compared to previous work, we used the simplest possible functional form, with residue-residue interactions and a stepwise distance-dependence. We used increased computational resources, however, constructing 290,000 decoys for 219 protein-protein complexes, with a realistic docking protocol where the protein partners are flexible and interact through a molecular mechanics energy function. The energy parameters were optimized to correctly assign as many native complexes as possible. To resolve the multiple minimum problem in parameter space, over 64000 starting parameter guesses were tried for each energy function. The optimized functions were tested by cross validation on subsets of our native and decoy structures, by blind tests on series of native and decoy structures available on the Web, and on models for 13 complexes submitted to the CAPRI structure prediction experiment.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17662112 PMCID: PMC2034607 DOI: 10.1186/1471-2105-8-270
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Hierarchical Amino Acid Classification Tree. Hierarchical clustering of amino acid types, using the Blosum50 similarity matrix (right) or the optimized, 20-class energy matrix (left). The Pearson Correlation coefficent of each cluster is given for the lefthand tree.
Fold recognition: performance of the optimized energy functions on the Optimization and Test sets
| Number of amino acid groups | Optimization Set | Test Set | ||
| 2 | 74.3 | 69.8 | 74.4 | 70.0 |
| 3 | 76.4 | 72.2 | 75.3 | 72.0 |
| 4 | 87.5 | 84.4 | 87.4 | 84.6 |
| 6 | 88.3 | 86.7 | 91.9 | 89.4 |
| 20 | 94.4 | 91.9 | 91.3 | 88.4 |
Two- and three-class energy parameters for fold recognition and dimer interface recognition. Best parameters (kcal/mol) for monomeric fold recognition (upper right) and dimer interface recognition (lower left).
| H | P | ||||
| -8.5 | 9.0 | H = {ALSVTPIGMCFYW} | |||
| H | -9.0 | -3.5 | P = {EKRHDNQ} | ||
| P | 9.8 | -7.1 | |||
| H | P | ||||
| H | H | P | |||
| -4.7 | -9.6 | 6.4 | H | ||
| H | -3.8 | -11.4 | 1.5 | H | |
| H | -8.4 | -14.3 | -1.9 | P = {EKRHDNQ} | |
| P | 4.8 | 2.4 | -1.5 | ||
| H | H | P | |||
Four- and six-class energy parameters for fold recognition and dimer interface recognition. Best four- and six-class energy parameters (kcal/mol) for monomeric proteins (upper right) and dimer interface recognition (lower left)
| H | H | H | P | |||||
| -6.96 | 1.81 | -8.85 | 1.94 | H | ||||
| H | -5.70 | -0.59 | 0.15 | 1.56 | H | |||
| H | -0.60 | -0.02 | -6.41 | 1.53 | H | |||
| H | -3.68 | -0.61 | -7.08 | -1.30 | P = {EDNQKRH} | |||
| P | 1.75 | 0.04 | 0.63 | 1.21 | ||||
| H | H | H | P | |||||
| H | H | H | G | |||||
| -8.30 | 0.30 | -5.11 | 3.77 | 2.62 | 0.43 | H | ||
| H | -2.06 | -1.35 | 1.49 | 0.42 | 1.30 | 2.54 | H | |
| H | 0.09 | -0.02 | -7.79 | 1.01 | 0.79 | -0.85 | H | |
| H | -1.73 | -0.45 | -0.65 | 1.61 | -3.30 | 1.00 | P | |
| P | 0.76 | 0.13 | 0.36 | 0.52 | 1.57 | -0.38 | P | |
| P | 0.51 | -0.24 | -0.25 | -0.30 | 0.53 | -0.08 | G | |
| G | -0.17 | 0.12 | -0.55 | 0.04 | 0.19 | 0.01 | ||
| H | H | H | P | P | G | |||
Figure 2Discrimination according to Protein Size. Discrimination power of the different amino acid alphabets for fold recognition as a function of protein length (number of amino acids). The corresponding energy functions are those derived for fold recognition, using the Monomeric Optimization Set. The mean number of decoys is shown vs. protein length (grey bars; righthand graduations).
Figure 3Characterizing the dimeric protein complexes . Upper panel: the distribution of interface sizes among the 219 complexes. Lower panel: the propensity PX of each amino acid type X to be found in the interface (Eq. 1). Positive (negative) values correspond to types that are overrepresented (underrepresented) in the interfaces, compared to their abundance in the SwissProt data bank. The two shades of grey correspond to the binary amino acid classification.
Figure 4Coverage of the surface of the receptors. Two examples of decoy structure series, along with their native structures: 1ARO (top) and 1BJF (bottom). In each case, the native complex is shown. One partner (the 'A' receptor) is arbitrarily taken as a reference (orange ribbon). The second partner (the 'B' ligand) is shown as a cyan tube. The decoys corresponding to B are schematized by sticks; constructed from the center of mass and an arbitrary atom in the decoy structure.
Figure 5Coverage of the surface of the receptors. Surface coverage of one partner by the other for the (homodimeric) 1AA7 decoy series (total of 1507 decoys). For each surface amino acid in one partner, we show the number of decoys where it is part of the interface (ie, buried by the other partner). Polar/nonpolar amino acids are in black/grey. We see that all the amino acids at the protein surface participate in the interfaces of many decoys.
Dimer interface recognition: performance of the energy functions
| Number of amino acid groups | Optimization Set 1 | Optimization Set 2 | ||||||
| Opt. | Test. | Opt. | Test | |||||
| 2 | 89.4 | 90.7 | 87.5 | 46.7 | 90.1 | 90.1 | 87.2 | 47.3 |
| 3 | 93.0 | 92.7 | 89.0 | 50.5 | 93.0 | 92.7 | 89.9 | 56.4 |
| 4 | 97.9 | 96.7 | 93.5 | 60.7 | 97.9 | 97.2 | 94.6 | 68.2 |
| 6 | 98.0 | 97.0 | 93.7 | 63.5 | 98.0 | 97.7 | 95.4 | 69.1 |
| 20 | 99.1 | 97.6 | 93.9 | 67.3 | 98.7 | 98.2 | 94.8 | 69.1 |
| 20/Bastolla | - | - | 96.5 | 56.4 | - | - | 93.3 | 62.6 |
| 20/Lu | - | - | 93.6 | 63.6 | - | - | 93.5 | 63.5 |
Results for the parameter sets produced with OS1. Results for the parameter sets produced with OS2. Results on the Optimization set. Results on the Test set. and Dinclude only the performance for interface recognition (the fold recognition statistics are left out). With the Bastolla or Lu energy functions.
Figure 6Spectra of the distribution of the energy of association of different serie of decoys. Native and decoy energy distributions for three complexes: 1AR0 (top), 1BJF (middle) and 1FBT (bottom), using the energy functions with two (left), six (middle), and 20 amino acid classes (right). The native energy is shown as a thin bar. Thick curves in the lefthand panels correspond to a random energy model for decoy energies; see main text. Our decoy energies are significantly more diverse than the random energy model.
Figure 7Decoy energies versus deviation from native structure. Decoy energies as a function of the structural deviation from native for three complexes: 1AR0 (top), 1BJF (middle) and 1FBT (bottom), using the energy function with 20 amino acid classes. The native energy is shown as a large dot (lower left). The structural deviation is measured by the rms difference RMSD in Cpositions.
Figure 8Contour representations of the interaction parameters for selected energy functions. Top row: energy functions with two, three and four amino acid classes, optimized for interface recognition using Optimization Set 2. Bottom row: six- and 20-class functions optimized over OS1 or OS2, as indicated. Contours levels in kcal/mol. Each energy matrix has a mean of zero. The amino acids belonging to each class are shown next to the corresponding rows and columns of each matrix. For the sake of clarity, the rows and columns of the 20-class matrix are not labelled individually but groupwise.
Four- and six-class parameters averaged over the complete energy matrix (standard deviation in parentheses) In Kca/mol.
| H | H | H | P | P | G | ||
| -2.03(0.07) | 0.13(0.06) | -1.70(0.05) | 0.79(0.04) | 0.54(0.04) | -0.13(0.03) | H | |
| 0.01(0.05) | -0.43(0.05) | 0.16(0.05) | -0.22(0.05) | 0.11(0.04) | H | ||
| H | -2.03(0.07) | -0.65(0.02) | 0.37(0.05) | -0.22(0.06) | -0.56(0.06) | H | |
| H | 0.08(0.11) | 0.05(0.07) | 0.55(0.05) | -0.27(0.07) | 0.03(0.07) | P | |
| H | -1.70(0.05) | -0.45(0.07) | -0.65(0.02) | 0.56(0.08) | 0.24(0.13) | P | |
| P | 0.71(0.10) | 0.02(0.19) | 0.12(0.30) | 0.15(0.42) | 0.1(0.) | G | |
| H | H | H | P |
Energy rank for heterodimeric structures. Energy rank of the native structure, compared to its decoys, using various energy functions. 23 hetermodimers with their decoys, not used in the parameterization. The 20- and 6-class energy functions used are the ones optimized on OS2. Heterodimers in the TS1 and TS2 data sets. For the structures in TS1 (bottom 11), the OS1 energy functions are used; for those in TS2 (top 6), the OS2 functions are used (i.e., we show cross-validated results). Fraction of successful series. D and D3 correspond to strong discrimination (native structure ranked first) and weak discrimination (native ranked among top three; see Methods).
| PDB ID | 20Cl | Bastolla | 6Cl | PDB ID | 20Cl | Bastolla | 6Cl |
| 1 | 1 | 1 | 4 | 2 | 3 | ||
| 1 | 1 | 1 | 1 | 1 | 1 | ||
| 1 | 1 | 1 | 3 | 3 | 2 | ||
| 2 | 1 | 2 | 3 | 6 | 3 | ||
| 4 | 4 | 4 | 3 | 4 | 3 | ||
| 1 | 1 | 1 | 2 | 47 | 1 | ||
| 8 | 9 | 7 | 1 | 3 | 1 | ||
| 1 | 1 | 1 | 1 | 1 | 1 | ||
| 5 | 3 | 5 | 1 | 1 | 1 | ||
| 3 | 2 | 4 | 67 | 145 | 36 | ||
| 1 | 1 | 1 | 19 | 4 | 9 | ||
| 20 | 8 | 16 | 1 | 1 | 1 | ||
| 1 | 1 | 1 | 1 | 1 | 3 | ||
| 1 | 1 | 1 | 1 | 1 | 1 | ||
| 3 | 3 | 4 | 1 | 2 | 1 | ||
| 4 | 4 | 2 | 14 | 69 | 2 | ||
| 1 | 10 | 1 | 2 | 5 | 1 | ||
| 10 | 1 | 9 | 8/17 | 6/17 | 9/17 | ||
| 3 | 1 | 2 | 13/17 | 9/17 | 15/17 | ||
| 1 | 1 | 1 | |||||
| 1 | 1 | 1 | |||||
| 1 | 2 | 1 | |||||
| 1 | 2 | 1 | |||||
| 5 | 6 | 5 | |||||
| 13/24 | 13/24 | 13/24 | |||||
| 17/24 | 18/24 | 16/24 | |||||
Blind tests of dimer interface recognition: the rank of the native structure. The top (1AVZ-2SIC) and middle groups are the Sternberg and Vakser test sets, respectively (see Methods). The lower group (T04–T19) are the 2005 CAPRI target structures. Columns 2–5 correspond to the energy functions optimized with OS1 and OS2, using 20, 4, or 6 amino acid classes, as indicated.Energy functions from Refs. [19], [20]. Numbers in bold show cases where the native structure is ranked among the top ten structures.
| PDB ID | 20Cl OS1 | 20Cl OS2 | 6Cl OS2 | 4Cl OS2 | Lu | Bastolla |
| 73 | 35 | 67 | 100 | 54 | 40 | |
| 65 | 62 | 39 | 16 | |||
| 76 | 37 | 41 | 31 | 70 | ||
| 56 | 18 | |||||
| 100 | 100 | 100 | 100 | 78 | 100 | |
| 24 | ||||||
| 11 | 20 | 71 | ||||
| 99 | 100 | 100 | 100 | 97 | 98 | |
| 20 | 72 | 82 | 79 | 25 | 23 | |
| 17 | 48 | 56 | 17 | 14 | 96 | |
| 33 | ||||||
| 31 | 58 | 52 | 27 | 20 | 17 | |
| T04 | 11 | 11 | ||||
| T05 | 64 | 61 | 62 | 62 | 60 | 60 |
| T06 | ||||||
| T07 | 58 | 63 | 62 | 56 | 58 | 64 |
| T08 | 84 | 52 | 68 | 96 | 136 | 145 |
| T09 | 165 | 162 | 164 | 165 | 159 | 164 |
| T11 | ||||||
| T12 | ||||||
| T13 | 194 | 176 | 179 | 175 | 174 | 175 |
| T14 | 68 | 121 | 91 | 125 | 18 | 43 |
| T15 | 12 | |||||
| T18 | 25 | |||||
| T19 | 38 | 22 | 37 | |||
Native complex discrimination and residue contacts at the interfaces of submitted and target CAPRI structures
| Interface contacts | |||||
| Target number | Native rank | HH | HP | PP | Top decoy's interface contact number |
| T04 | 53 (49.8) | 32 (39.7) | 15 (10.4) | 1.1 | |
| T05 | 64 61 | 42 (56.8) | 51 (34.4) | 7 (8.7) | 1.4 |
| T06 | 46 (43.1) | 33 (37.1) | 21 (19.9) | 1.0 | |
| T07 | 58 62 | 23 (25.7) | 43 (44.9) | 35 (29.3) | 1.3 |
| T08 | 84 61 | 7 (40.2) | 77 (42.6) | 16 (17.1) | 4.3 |
| T09 | 165 162 | 27 (39.2) | 53 (42.1) | 20 (18.6) | 2.2 |
| T11 | 48 (42.5) | 41 (42.8) | 11 (14.6) | 1.0 | |
| T12 | 48 (53.5) | 41 (36.9) | 11 (9.5) | 1.1 | |
| T13 | 194 176 | 65 (64.2) | 33 (30.5) | 2 (5.3) | 1.0 |
| T14 | 68 121 | 26 (31.4) | 51 (44.2) | 23 (24.4) | 0.6 |
| T15 | 12 | 18 (17.3) | 44 (48.6) | 38 (14.5) | 0.8 |
| T18 | 25 | 36 (49.3) | 49 (43.1) | 14 (7.6) | 1.0 |
| T19 | 36 (45.1) | 44 (40.0) | 20 (14.9) | 0.6 | |
Values for the 20Cl/OS1, OS2 functions, already given in Table 7. The percentage of interface contacts of each type, averaged over the 99 decoy structures: hydrophobic-hydrophobic (HH), hydrophobic-polar (HP), polar–polar (PP); values for the native structure in parentheses. The relative contact number F = N/N(Eq. 2) of the decoy ranked first by the 20-class OS2 energy function.