Literature DB >> 21572879

Insights from the structural analysis of protein heterodimer interfaces.

Gopichandran Sowmya, Sathyanarayanan Anita, Pandjassarame Kangueane.

Abstract

Protein heterodimer complexes are often involved in catalysis, regulation, assembly, immunity and inhibition. This involves the formation of stable interfaces between the interacting partners. Hence, it is of interest to describe heterodimer interfaces using known structural complexes. We use a non-redundant dataset of 192 heterodimer complex structures from the protein databank (PDB) to identify interface residues and describe their interfaces using amino-acids residue property preference. Analysis of the dataset shows that the heterodimer interfaces are often abundant in polar residues. The analysis also shows the presence of two classes of interfaces in heterodimer complexes. The first class of interfaces (class A) with more polar residues than core but less than surface is known. These interfaces are more hydrophobic than surfaces, where protein-protein binding is largely hydrophobic. The second class of interfaces (class B) with more polar residues than core and surface is shown. These interfaces are more polar than surfaces, where binding is mainly polar. Thus, these findings provide insights to the understanding of protein-protein interactions.

Entities: Chemical Disease Gene

Keywords: core; heterodimer; interface; polar abundance; protein-protein interaction (PPI); surface

Year: 2011 PMID： 21572879 PMCID： PMC3092946 DOI： 10.6026/97320630006137

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

The formation of protein complexes by two different proteins (heterodimers) involves a stable interface. The driving force deterministic of their interface features (chemical and physical) is essential for its molecular function. However, our current knowledge on the molecular principles of protein-protein binding is limited. Hence, the identification of a binding partner from sequence alone is still a great challenge. Therefore, it is of importance to document interface residue types in heterodimers using an updated yet non-redundant dataset of structures determined by X-ray crystallography. The description of interfaces using amino acid residues and their types help understand proteinprotein interaction (PPI). The principles of PPI gleaned from the analysis of protein complexes determined by X-ray crystallography have been documented in the literature [1-19]. PPI was described using various structural (e.g. interface area, interface size, gap index, volume, planarity, hydrogen bonds, etc.) and sequence (e.g. protein size, residue type, residue frequency, conserved interface patterns, etc.) property parameters in these studies. These studies provide mean statistics on interface features for large datasets. This provided valuable insights to the understanding of protein-protein interactions. However, protein-protein interaction is specific and every interface is unique. Hence, it is important to classify known protein complexes based on interfaces. The classical work by Chothia & Janin (1975) showed that protein interfaces are dominantly hydrophobic [1]. It was later detailed by Jones & Thornton (1995) that interfaces have more hydrophobic residues than surface but less than core [2]. The role of interface hydrophobic residues in binding was also later acknowledged by Tsai et al. (1997) [3]. It was found that large and strong hydrophobic patches are dominating features at the interface [4]. The use of a hydrophobic mean-field potential for protein subunit docking was also subsequently demonstrated [5]. Hydrophobic interfaces with few charged groups have been described [6]. This study also documented that interface residues are either “abundantly polar” or “abundantly hydrophobic”. The presence of distinctly clustered yet conserved residues at the interface was known [7]. Interfaces have also been described using features (e.g. protein size, interface size, interface area, gap volume, gap index, planarity, hydrogen bonds, salt bridges, residue propensity, etc.) based on mean statistics for large datasets [8, 9, 10, 11, 12, 13]. Online web servers are also available for studying PPI using these features [14, 15, 16]. Thus, the progress on the understanding of the molecular principles of protein-protein binding is prominent. It should be stated that these studies use datasets consisting of both heterodimers and homodimers. The formation of homodimers and their folding through 2-state (2S - without intermediate) and 3-state (3S - with stable intermediate) mechanisms is distinct from that of heterodimers [20]. Therefore, it is our interest here, to study and understand heterodimer complexes only, using interface residue types. Moreover, it is known that non-specific interfaces are less pronounced in heterodimer complexes and hence, the need to distinguish true and false complexes is not compelling [9]. We use percentage polar residues to describe interface in comparison with core and surface for 209 heterodimer complexes to classify them into distinct classes.

Materials & Methodology

Heterodimers dataset

We created an updated yet non-redundant heterodimer dataset from protein databank (PDB) [21]. The availability of precompiled datasets are described in ProtorP [16] and PQS [22] online servers. ProtorP provides no option for download and PQS has not been updated since 1999. Therefore, it is essential to create an updated yet non-redundant heterodimer dataset from PDB (Table 1 see Table 1) using the procedure outlined in Figure 1. In this procedure, we downloaded 5,387 entries from PDBelite web interface using the predefined keywords “hetero AND dimer” [23]. However, this dataset was redundant corresponding to about 28,525 sequence chains. This is more than the expected 10,774 (5,387*2) due to the presence of multiple sequence chains (>2 chains) in several entries. Therefore, we extracted the PDB entries (984) with just two sequence chains. Thus, a sequence set of 1,968 sequences corresponding to 984 PDB entries was created. This dataset was redundant at sequence level and hence, the dataset was subjected to CD-HIT (sequence redundancy removal program) [24] at 40% sequence similarity cut-off (with step size n = 2). This resulted in 680 unique sequences corresponding to 457 PDB entries. It should be noted that the number of complexes is more than half of the number of chains. This is because the interface is a combination of two chains and thus, the interfaces are non-redundant. This set contained about 60 RNA/DNA, homodimer and HETATOM structures and these entries were removed. The 397 protein complexes produced were further refined to remove short peptides of chain length <=50 residues and resolution > 3.5 Å. This resulted in a non-redundant dataset of 192 heterodimer protein complexes (Table 1). The dataset was subsequently characterized for protein size distribution (Figure 2).

Figure 1

A flowchart for the creation of a non-redundant heterodimer dataset. PDB = Protein databank.

Figure 2

Characterization of the dataset based on protein size.

Source organism based grouping

Each heterodimer complex is made up of two protein monomer subunits. The source for each protein subunit is either different (different organism (DO)) or same (same organism (SO)) (Table 1). The formation of a protein complex with interacting partners from DO is possible, often for a non-essential (nonobligatory, e.g. inhibitory) role, only in heterodimers. Thus, the dataset is divided based on organism source of interacting partners. The dataset also consists of 5 (FIVE) complexes with at least one synthetic partner (SP).

Functional grouping of complexes

We extracted “descriptive” functional data (usually semantic) for each complex from the PDB header annotation records. This data was manually curated (“by domain expert decision”) through visual inspection using available literature information. Thus, complexes were generally grouped based on function into catalysis (enzymes), regulatory (cellular), assembly (structural), immunity and inhibitory (Table 1). It should be noted that this exercise is not comprehensive. However, we have taken reasonable effort on a case by case basis to classify complexes into their respective functional groups. Manual inspection of PDB description records suggests that DO complexes are often inhibitory (e.g. PDB code: 1K9O) or immune (e.g. PDB code: 1GH6) related (Table 3 see Table 3). However, SO complexes are associated with catalysis, regulatory, assembly and immunity. The SP group consists of a synthetic partner for in vitro inhibitory or regulatory studies. It is often possible that a complex may align with two different functional groups, where such complexes are grouped based an “expert decision” using known information.

Accessible surface area (ASA)

ASA was calculated using the WINDOWS software Surface Racer [25] with Lee and Richard (1971) [26] implementation. A probe radius of 1.4 Å was used for ASA calculation.

Interface residues

Interface (I) residues in heterodimers are identified using change in accessible surface area (ΔASA) from a “monomer-state” to a “dimer-state”. Residues with ΔASA > 0 Å are considered to be at the interface. Thus, interface residues contributed by subunits A and B were identified.

Interface size and Interface area

The distribution of complexes with interface size (number of interface residues) is given in Figure 3. The relationship between interface size and interface area is given in Figure 4.

Figure 3

Distribution of complexes based on interface size.

Figure 4

Relationship between interface size and interface area among complexes.

Interface property abundance

The interface between two interacting subunits is made of both polar and hydrophobic residues. The number of polar and hydrophobic residues at the interface varies from complex to complex. Some interfaces are rich in polar residues, while some others are rich in hydrophobic residues. Therefore, we calculated the percentage of polar and hydrophobic residues at the interface for each complex. The difference in the percentages of polar (P) and hydrophobic (H) residues at the interface is measured (Figure 5). Thus, interface residues have “polar abundance” when %P - %H > 0 and “hydrophobic abundance” when it is < 0. This help to classify complexes with interfaces based on “abundant polar” and “abundant hydrophobic” residues.

Figure 5

Cumulative distribution of complexes based on interface property. Complexes distributed in the positive X-axis have interfaces with polar residue abundance and those distributed in the negative X-axis have interfaces with hydrophobic residue abundance.

Surface residues

Surface (S) residues in heterodimers are identified using residue ASA values in a “dimer state”. Residues with ASA > 0 Å are considered as surface residues. Thus, surface residues in the subunits A and B of the complex were identified.

Core residues

Core (C) residues in heterodimers are identified using residue ASA values in a “monomer state”. Residues with ASA = 0 Å are considered as core residues. Thus, core residues in the subunits A and B were identified.

Interface, surface and core polarity

A protein heterodimer complex consists of three distinct regions (core (C), interface (I) and surface (S)) as shown in Figure 6. Interface, surface, core residues in a complex thus documented are further classified into polar and hydrophobic residues. Thus, interface, surface and core residues are grouped as polar {R, N, D, Q, H, K, S, T, Y, E} and hydrophobic {A, C, G, I, L, M, F, P, V, W} based on residue type. We then estimated the percentage of polar residues at interface (I), surface (S) and core (C) for each complex.

Figure 6

Illustration of surface (S), core (C) and interface (I) regions in a heterodimer complex. The interface is the interacting region between the two protein partners. The core is the buried region in the individual monomers. The surface is the solvent exposed region in the complex state.

Classification of complexes

Complexes were grouped into four distinct classes based on the relative difference in percentage polar residues (referred thereafter as polarity) between interface and core (Figure 7; Table 2 see Table 2). Complexes with interface polarity greater than core but less than surface, such that [S>I>C] are “class A”. Complexes with interface polarity greater than core and surface, such that [SC] are “class B”. Complexes with interface polarity less than core and surface, such that [S>I
Figure 7
Grouping of the complexes based on their relative interface (I), core (C) and surface (S) polarity. Interfaces often have more polar residues than core in [I>C] groups. The hierarchical grouping shows the abundance of class A [S>I>C] and class B [SC] complexes in the dataset. Class C [S>I
Statistical analysis
The statistical significance analysis was calculated using the GraphPad Prism (version 5) software [http://www.graphpad.com/]. The F test for variance comparison was used for calculating the significance of functional preference between DO and SO group of complexes.
Results
The principles of PPI were studied using a dataset of 192 heterodimer complexes (Table 1) created using a procedure described in Figure 1. The dataset is divided based on the organism source of the interacting partners. Thus, SO, DO, and SP group of complexes were identified (Table 1). The distribution of complexes based on interacting protein size is given in Figure 2. This describes the size of interacting protein partners forming the complex. These partners interact through interface residues. The distribution of interface size among heterodimer complexes is given in Figure 3. The interfaces have interface areas which correlate with interface size (Figure 4). The chemical nature of interface residues in complexes is given in Figure 5. This shows that interface residues in complexes are either “abundantly polar” or “abundantly hydrophobic”. However, majority of interfaces (121/192 – 63%) have abundantly polar residues. The classification of complexes using relative polarity between interface, core and surface into classes A-D was shown (Table 2; Figure 8). This grouping shows that majority (191/192 - 99%) of interfaces have polarity greater than core [I>C] as shown in Figure 7. However, interfaces in two complexes (1/192 – <1%) have polarity less than core [IDO complexes are mostly inhibitory and SO are usually associated with catalysis, regulation and assembly (Table 1; Table 3). Thus, DO and SO group of complexes show functional preference (p = 0.019). However, this is not true for classes (A–D) as shown in Table 4 (p = 0.12). Table 2 shows that complexes grouped in classes A, B, C and D does not show significant difference for function preference.
Figure 8
Distribution of complexes based on interface class. The distribution shows that 64% of complexes have “class A” interface and 36% of complexes have “class B” interface.
Discussion
Protein-protein interactions are vital for cellular function. Two different proteins associate with one another for function (catalysis, regulatory and assembly) that are often obligatory (essential for cellular activity). However, this is not always true. They also interact for inhibitory and immune related role, where their association is frequently non-obligatory (not essential for cellular activity). The dataset shows that obligatory role is usually observed among SO complexes and non-obligatory functions are common among DO complexes. Thus, the functional role exhibited by complexes based on organism source is significantly distinct (p value = 0.019). However, the molecular principles for such associations are not clearly known. The molecular forces for protein interactions are gathered through analysis of known structural complexes. Hence, we describe the analysis of a dataset of 192 heterodimer complexes using polarity of the interface, surface and core for classifying them into classes A - D. Analysis of protein structural complexes showed that interfaces are either “dominantly polar” [6] or “dominantly hydrophobic” [1, 2, 6]. It is also known that the interface hydrophobic residues are more than surface but less than core [2]. Hydrophobic interfaces are similar to surface with few charged groups [6]. Our analysis shows that class A complexes have interface polarity greater than core but less than surface as reported elsewhere [2]. Thus, this observation is acknowledged in this study using an extended dataset. Interfaces are part of the surfaces in the monomers, where the interface hydrophobic residues are more than the rest of the surface and the partners interact through relative hydrophobic forces. It should be noted that we identified an unusual complex (PDB code: 2F95) under class C describing rhodopsin II/transducer interaction. The core is made of more polar residues than the interface in this complex. Thus, protein binding is hydrophobic, although, folding of the individual monomers are driven by polar residues, as in several non-globular proteins. We also identified class B complexes with interface polarity greater than both core and surface. In this class, interface polar residues are more than the rest of the surface and partners interact through polar interactions. Thus, relative polarity is the driving force in class B complexes. This class of interfaces has not been described in the literature and it is novel. The driving force for protein binding is hydrophobic in class A and polar in class B complexes. These observations using interface residue properties are imminent to the understanding of protein binding in heterodimer complexes. This study should be extended using a combined formulation of residue types and atomic features in future investigation. It should also be noted that interfaces between partners are part of surfaces in interacting monomers. These interfaces are clearly defined in known structural complexes. However, there are often several binding sites in an interacting monomer under in vivo conditions and these have not yet been characterized. Therefore, experiments should be formulated to capture these combined features in future studies.
Conclusion
Proteins associate with one another as a resultant effect of both polar and hydrophobic residues at the interface. The unresolved challenge here is to quantify their combined effect at the interface. Inter-subunit scoring functions for polar and hydrophobic effects are available based on a limited set of structural complexes and are always inadequate to describe new classes of interfaces. It is known that interface residues are either “abundantly polar” or “abundantly hydrophobic”. It is also known that interfaces are less hydrophobic than core but more than surface in a class of complexes. We document a new class of complexes with more interface residues than core and surface. Thus, the driving force for protein-protein interaction is selectively either hydrophobic or polar for different classes of interfaces.

24 in total

1. Principles of protein-protein recognition.
Authors: C Chothia; J Janin
Journal: Nature       Date: 1975-08-28       Impact factor: 49.962
2. A dissection of specific and non-specific protein-protein interfaces.
Authors: Ranjit Prasad Bahadur; Pinak Chakrabarti; Francis Rodier; Joël Janin
Journal: J Mol Biol       Date: 2004-02-27       Impact factor: 5.469
Review 3. The structure of protein-protein recognition sites.
Authors: J Janin; C Chothia
Journal: J Biol Chem       Date: 1990-09-25       Impact factor: 5.157
4. SHARP2: protein-protein interaction predictions using patch analysis.
Authors: Yoichi Murakami; Susan Jones
Journal: Bioinformatics       Date: 2006-05-03       Impact factor: 6.937
5. Peptide segments in protein-protein interfaces.
Authors: Arumay Pal; Pinak Chakrabarti; Ranjit Bahadur; Francis Rodier; Joel Janin
Journal: J Biosci       Date: 2007-01       Impact factor: 1.826
6. ProtorP: a protein-protein interaction analysis server.
Authors: Christopher Reynolds; David Damerell; Susan Jones
Journal: Bioinformatics       Date: 2008-11-11       Impact factor: 6.937
7. A soft, mean-field potential derived from crystal contacts for predicting protein-protein interactions.
Authors: C H Robert; J Janin
Journal: J Mol Biol       Date: 1998-11-13       Impact factor: 5.469
8. The accessible surface area and stability of oligomeric proteins.
Authors: S Miller; A M Lesk; J Janin; C Chothia
Journal: Nature       Date: 1987 Aug 27-Sep 2       Impact factor: 49.962
9. Conserved residue clusters at protein-protein interfaces and their use in binding site identification.
Authors: Mainak Guharoy; Pinak Chakrabarti
Journal: BMC Bioinformatics       Date: 2010-05-27       Impact factor: 3.169
10. Protein subunit interfaces: heterodimers versus homodimers.
Authors: Cui Zhanhua; Jacob Gah-Kok Gan; Li Lei; Meena Kishore Sakharkar; Pandjassarame Kangueane
Journal: Bioinformation       Date: 2005-08-11

View more

4 in total

1. Linking structural features of protein complexes and biological function.
Authors: Gopichandran Sowmya; Edmond J Breen; Shoba Ranganathan
Journal: Protein Sci       Date: 2015-07-14       Impact factor: 6.725
2. Geometrical and electro-static determinants of protein-protein interactions.
Authors: Vicky Kumar; Ashita Sood; Anjana Munshi; Tarkeshwar Gautam; Mahesh Kulharia
Journal: Bioinformation       Date: 2021-10-31
3. Discrete structural features among interface residue-level classes.
Authors: Gopichandran Sowmya; Shoba Ranganathan
Journal: BMC Bioinformatics       Date: 2015-12-09       Impact factor: 3.169
4. Protein-protein interfaces are vdW dominant with selective H-bonds and (or) electrostatics towards broad functional specificity.
Authors: Christina Nilofer; Anshul Sukhwal; Arumugam Mohanapriya; Pandjassarame Kangueane
Journal: Bioinformation       Date: 2017-06-30

4 in total