| Literature DB >> 20507585 |
Mainak Guharoy1, Pinak Chakrabarti.
Abstract
BACKGROUND: Biological evolution conserves protein residues that are important for structure and function. Both protein stability and function often require a certain degree of structural co-operativity between spatially neighboring residues and it has previously been shown that conserved residues occur clustered together in protein tertiary structures, enzyme active sites and protein-DNA interfaces. Residues comprising protein interfaces are often more conserved compared to those occurring elsewhere on the protein surface. We investigate the extent to which conserved residues within protein-protein interfaces are clustered together in three-dimensions.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20507585 PMCID: PMC2894039 DOI: 10.1186/1471-2105-11-286
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Distribution of M. In (B) the antibody-antigen complexes are marked as triangles.
Parameters delineating the clustering of conserved residues in interfaces
| Interface type | Averagea | Number of interfacesb | ||||
|---|---|---|---|---|---|---|
| Ms cons | Ms int | ρ | Total | With Ms cons greater than Ms int | ||
| Homodimers | 0.081 (0.02) | 0.071 (0.02) | 1.13 (0.08) | 121 | 117 | 1.57E-04 |
| 0.087 (0.02) | 0.070 (0.02) | 1.24 (0.20) | 103 | 94 | 1.93E-08 | |
| Complexes | 0.102 (0.03) | 0.089 (0.02) | 1.14 (0.14) | 392 | 340 | 9.64E-14 |
| 0.113 (0.04) | 0.090 (0.02) | 1.26 (0.30) | 309 | 252 | < 2.2E-16 | |
| Complexes (antibody-antigen excluded) | 0.103 (0.03) | 0.088 (0.02) | 1.16 (0.14) | 348 | 313 | 4.86E-14 |
| 0.115 (0.04) | 0.089 (0.02) | 1.28 (0.30) | 271 | 229 | < 2.2E-16 | |
| Antibody-antigen complexes | 0.101 (0.02) | 0.097 (0.01) | 1.04 (0.15) | 44 | 23 | 0.59 |
| 0.103 (0.03) | 0.097 (0.01) | 1.07 (0.29) | 38 | 21 | 0.57 | |
Two sets of values are provided, corresponding to two different ways of identifying the subset of conserved residues (see Methods). In the first, conserved residues in each interface are those whose sequence entropy values (calculated using Eq. 1) are lower than the mean sequence entropy (< s>int) for that interface; in the second method, conserved residues have s < (< s>int - σ), σ being the standard deviation of 's' values over all residues in that particular interface. The first method was also repeated by using Eq. 1a (instead of Eq. 1) for the calculation of sequence entropy and the results are provided in square brackets.
a Standard deviations are in parentheses.
b Multiple sequence alignments were available in the HSSP database for all proteins in our datasets with the exception of one homodimer, and therefore the analysis could not be carried out for that interface. 121 homodimeric interfaces and 408 interfaces belonging to 204 protein-protein complexes were analyzed - since the subunit interfaces in homodimers are identical, the analysis was performed for only a single subunit in homodimers. For protein complexes, each of the two components was analyzed separately. The average numbers of aligned homologous sequences in the HSSP files were 768 and 1391 for homodimers and protein complexes, respectively, and the percentage sequence identities of the aligned proteins ranged between 30 and 100%. For 16 protein chains belonging to the dataset of complexes, all the interface residue positions in the multiple sequence alignments were fully conserved and therefore the average interface entropy was 0.0. This did not allow the identification of the subset of conserved residues within the whole set of interface residues, precluding the calculation of clustering of conserved residues relative to the whole interface. Therefore, the statistics are shown for the remaining 392 interfaces only. A smaller number of interfaces is reported in the second row of data (corresponding to Method 2), where because of the use of a more stringent condition of conservation, some interfaces, with 0 or 1 conserved residue, get excluded from consideration.
c The non-parametric Mann-Whitney U-test was used to test for statistical significance of the hypothesis that Ms,cons is greater than Ms,int. P < 0.01 indicates that Ms,cons is significantly greater than Ms,int at the 1% level. All statistical calculations (including P-values) were implemented using R [64].
Figure 2Histogram showing the percentage distribution of the ρ values for the interfaces in homodimers and protein-protein complexes.
Statistics showing the significance of clustering of conserved interface residues compared to the clustering in the subsets of the same-size containing randomly selected interface residues from the same structure
| Datasets | Selection of conserved interface residuesa | |||||||
|---|---|---|---|---|---|---|---|---|
| s < < s>int | s < (< s>int - σ) | |||||||
| Number | Ms,cons | < Ms,random> | Number | Ms,cons | < Ms,random> | |||
| Homodimers | 121 | 0.081 | 0.071 | 1.6E-04 | 103 | 0.087 | 0.070 | 2.0E-08 |
| Complexes | 389 | 0.103 | 0.089 | 6.5E-14 | 309 | 0.113 | 0.090 | 2.2E-16 |
a Conserved interface residues were selected using different criteria (see Methods and Table 1 footnote).
b P-values refer to the significance levels for the Mann-Whitney U-test corresponding to the hypothesis that Ms,cons is greater than < Ms,random > (i.e., the degree of clustering of conserved residues within an interface is greater than that of random, same-size subsets collected from the same interface). A P-value < 0.01 indicates that the difference is significant at the 1% level. Calculated using R [64].
Figure 3Histogram of the distribution of the number of conserved residue sub-clusters in interfaces as a function of the interface size in (A) homodimers and (B) protein-protein complexes. The x-axis labels mark the middle of the range in each column. Bins are of size 400 Å2 in (A) and 200 Å2 in (B). In (A) only the interfaces of subunit A have been considered because of the identical nature of the two chains. In (B), each component of the protein complex has been considered separately.
Figure 4Relative enrichment of the 20 amino acid types within conserved regions in protein-protein interfaces.
Figure 5Ranking of the interface relative to all possible surface patches. Distribution of the degree of clustering of conserved residues within interfaces as compared to other surface patches for (A) homodimers, and (B) protein complexes. For each protein, the interface is ranked, relative to all other surface patches, as being in the top 10% (rank 1), 10-20% (rank 2), etc. according to the ρ value (Eq. 4). Methods 1-3 for generating the surface patches are described in Methods. (C) For each protein, the generated surface patch having the maximum overlap with the true interface is found out and the distribution of the % overlap is plotted for all proteins belonging to the two datasets.
Interface prediction accuracy, with heterocomplexes divided into functional classes
| Interface type (number) | Number (and percentage) of interfaces with Rank 1 | ||
|---|---|---|---|
| Method 1 | Method 2 | Method 3 | |
| Homodimers (121) | 65/121 (53.7) | 58/121 (47.9) | 51/121 (42.2) |
| Heterocomplexes (389) | 189/389 (48.6) | 196/389 (50.4) | 187/389 (48.1) |
| Enzyme-inhibitor (114) | 77/114 (67.5) | 80/114 (70.2) | 75/114 (65.8) |
| Antibody-antigen (41) | 10/41 (24.4) | 09/41 (22.0) | 10/41 (24.4) |
| Signaling complexes (78) | 32/78 (41.0) | 34/78 (43.6) | 32/78 (41.0) |
| Others (156) | 70/156 (44.9) | 73/156 (46.8) | 70/156 (44.9) |
Details of the slightly different methods used to generate the surface patches are provided in Methods. Eq. 1 was used to calculate sequence entropy; however, in a few cases (values in square brackets) calculations were also repeated using Eq. 1a.
Figure 6Comparison of the clustering of conserved residues within the interface and other surface patches of human carboxypeptidase complexed to its inhibitor (PDB file, 1dtd). (A) The chain of interest (carboxypeptidase) is shown in spacefill (grey), its partner (inhibitor) in cartoon representation (yellow) in two different orientations. Conserved interface residues (on the enzyme) are colored green, the remaining interface residues are in blue. The partner protein is removed in the third view to clearly show the clustered nature of the conserved residues within the interface. (B) Diagram showing the construction of surface patches around each surface residue using a fixed cutoff of 15 Å (Method 1). (C) Sixteen different surface patches of the protein (in grey) are shown, in each of them the conserved residues (green) are scattered over the entire patch.