| Literature DB >> 20625507 |
Nicola Barbarini1, Luca Simonelli, Alberto Azzalin, Sergio Comincini, Riccardo Bellazzi.
Abstract
Protein interactions are crucial in most biological processes. Several in silico methods have been recently developed to predict them. This paper describes a bioinformatics method that combines sequence similarity and structural information to support experimental studies on protein interactions. Given a target protein, the approach selects the most likely interactors among the candidates revealed by experimental techniques, but not yet in vivo validated. The sequence and the structural information of the in vivo confirmed proteins and complexes are exploited to evaluate the candidate interactors. Finally, a score is calculated to suggest the most likely interactors of the target protein. As an example, we searched for GRB2 interactors. We ranked a set of 46 candidate interactors by the presented method. These candidates were then reduced to 21, through a score threshold chosen by means of a cross-validation strategy. Among them, the isoform 1 of MAPK14 was in silico confirmed as a GRB2 interactor. Finally, given a set of already confirmed interactors of GRB2, the accuracy and the precision of the approach were 75% and 86%, respectively. In conclusion, the proposed method can be conveniently exploited to select the proteins to be experimentally investigated within a set of potential interactors.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20625507 PMCID: PMC2896714 DOI: 10.1155/2010/670125
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1The Bioinformatics strategy for protein interaction prediction.
Figure 2Hydrogen bond geometrical structure scheme. The distance (d) between a donor (D) and an acceptor (A) and the resulting angles σ, β, θ and σ are reported.
List of the atoms considered as acceptors and donors. For both classes, the 3-letter codes of the amino acids, the symbol used in the PDB files of the considered atom, and the maximum number of hydrogen bonds of atom are reported.
| Acceptors | Donors | ||||
|---|---|---|---|---|---|
| all | O | 1 | all | N | 1 |
| ASP | OD1 | 2 | HIS | NE2 | 1 |
| ASP | OD2 | 2 | HIS | ND1 | 1 |
| GLU | OE1 | 2 | LYS | NZ | 3 |
| GLU | OE2 | 2 | ASN | ND2 | 2 |
| GLN | OE1 | 1 | GLN | NE2 | 2 |
| ASN | OD1 | 1 | ARG | NE | 1 |
| SER | OG | 1 | ARG | NH1 | 2 |
| THR | OG1 | 1 | ARG | NH2 | 2 |
| TRP | NE1 | 1 | |||
| SER | OG | 1 | |||
| THR | OG1 | 1 | |||
| TYR | OH | 1 | |||
Example of a center of bond enlargement. The columns show the amino acid one-letter code, the residues coordinates, the status of proximity with respect to the pattern chain in the complex, the status of hydropathy (1 hydrophobic, 0 hydrophilic) and the secondary structure (H = alpha chain; L = loop) of every amino acid around a center of bond. The center of bond is represented by the amino acids within the bold lines (i.e., S and N); the grey-highlighted rows are the results of the symmetrical enlargement due to hydropathy, while the amino acid reported in italic (V) are grouped because of its proximity to the opposite chain.
| Amino acid | Position | Proximity | Hydropathy | Sec.Struct. |
|---|---|---|---|---|
| V | 46 | 0 | 1 | |
| F | 47 | 0 | 1 | |
| V | 1 | 1 | H | |
| P | 49 | 0 | 0 | H |
| K | 50 | 0 | 0 | L |
| R | 53 | 0 | 0 | L |
| K | 54 | 0 | 0 | L |
| V | 55 | 0 | 1 | |
| I | 56 | 0 | 1 |
Amino acids classes considered with respect to their hydropathy and charge. Every row shows the identification class and the amino acids components.
| Class | Amino Acids |
|---|---|
| I | ILE-VAL-LEU |
| II | PHE-CYS-MET-ALA |
| III | GLY-THR-SER-TRP-TYR-PRO |
| IV | HIS-GLN-ASN |
| V | GLU-ASP |
| VI | LYS-ARG |
Overview of the scores used to rank the putative GRB2 interactors. NCBI protein accession number, Score1, Score2, the sequence and the structure configuration of the best motif, Score3 and Score4, were reported. The last column showed the final score assigned to each interactor.
| Access number | Score1 | Score2 | Best motif of sequence | Best struct. motif | Score3 | Number int. sites | Score4 | Final score |
|---|---|---|---|---|---|---|---|---|
| NP_001306 | 344.7 | 2378 | PPP[IVL] | LLLL | 148.1 | 27 | 181.2 | 2.52 |
| NP_003014 | 4 | 2177 | PPP[IVL] | LLLL | 95 | 12 | 168 | 2.33 |
| NP_060910 | −11.3 | 1774 | PPP[IVL]P | LLLLL | 2469.1 | 18 | 8.9 | 2.21 |
| NP_004432 | 1688.3 | 3910 | [ED]D[ED] | LLL | 2 | 31 | 14.5 | 2.08 |
| NP_004407 | −43.3 | 2468 | PPP[IVL] | LLLL | 86 | 22 | 92.2 | 2.02 |
| NP_003713 | −40.3 | 2417 | PPP[IVL] | LLLL | 78.4 | 24 | 87 | 1.99 |
| NP_005145 | −52.3 | 3370 | [ED]N[IVL] | LLL | 1.2 | 34 | 15.1 | 1.98 |
| NP_002030 | −2.3 | 2269 | PPP[IVL] | LLLL | 76.8 | 21 | 83.3 | 1.95 |
| NP_002511 | −3.7 | 1220 | [ED]ED[ED] | LLLL | 136 | 20 | 151.8 | 1.9 |
| NP_005148 | 29.3 | 3261 | [ED]N[IVL] | LLL | 1.2 | 24 | 4.9 | 1.88 |
| NP_003311 | −25 | 2204 | [ED]ED[ED] | LLLL | 71.3 | 26 | 81.6 | 1.86 |
| NP_003362 | 1032.3 | 3399 | [ED]D[ED] | LLL | 2.4 | 24 | 7 | 1.86 |
| NP_006566 | 798 | 3359 | [ED]D[ED] | LLL | 2.4 | 27 | 9.9 | 1.84 |
| NP_149129 | −24.7 | 3141 | [HQN][KR]S[GTSWYP][GTSWYP] | HLLLL | 18.7 | 11 | 20.1 | 1.82 |
| NP_005556 | 93.7 | 2141 | [ED]ED[ED] | LLLL | 75 | 12 | 77 | 1.8 |
| NP_036252 | 11.3 | 2472 | PPP[IVL] | LLLL | 83.5 | 26 | 94.1 | 1.76 |
| NP_003806 | 516.3 | 3112 | [IVL]N[IVL] | LLL | 1 | 22 | 8.3 | 1.76 |
| NP_001973 | 955.7 | 1960 | [ED]ED[ED] | LLLL | 29.8 | 31 | 40.6 | 1.75 |
| NP_004439 | 1297.3 | 2468 | [ED]D[ED] | LLL | 1.6 | 32 | 11.7 | 1.73 |
| NP_612401 | −50.3 | 2223 | PPP[IVL] | LLLL | 75.4 | 11 | 77 | 1.73 |
| NP_542179 | −19.3 | 2370 | PPP[IVL] | LLLL | 91 | 27 | 103.9 | 1.72 |
| NP_004680 | −47.7 | 2296 | RR[KR] | LLL | 5.6 | 23 | 8.2 | 1.57 |
| NP_002244 | 1179 | 1981 | [HQN]Q[HQN] | LLL | 0.6 | 30 | 6.7 | 1.55 |
| NP_002960 | 297.7 | 2304 | [ED]D[ED] | LLL | 5.4 | 14 | 20.5 | 1.55 |
| NP_000689 | −36.3 | 2377 | [KR]D[GTSWYP] | ELL | 1 | 25 | 8.6 | 1.5 |
| NP_005875 | 258.7 | 2345 | RR[KR] | LLL | 6.8 | 13 | 12.6 | 1.5 |
| NP_114098 | −10.7 | 2370 | [ED]D[ED] | LLL | 3 | 20 | 8.9 | 1.49 |
| NP_036428 | −14.7 | 2471 | G[GTSWYP]F | LLL | 2 | 18 | 4.8 | 1.48 |
| NP_066189 | 237.7 | 2526 | [GTSWYP]E[IVL] | LLE | 0.7 | 9 | 1.1 | 1.48 |
| NP_000869 | −4.7 | 2251 | RR[KR] | LLL | 7.2 | 26 | 11.7 | 1.44 |
| NP_005222 | −22.3 | 2246 | [ED]D[ED] | LLL | 3.6 | 21 | 7.9 | 1.42 |
| NP_055413 | −20.7 | 2313 | [KR]D[GTSWYP] | ELL | 1.1 | 8 | 0.6 | 1.42 |
| NP_065795 | −33.3 | 2166 | RR[KR] | LLL | 7.8 | 18 | 6.1 | 1.35 |
| NP_000732 | 106.7 | 1894 | [IVL]N[HQN] | LLL | 1.8 | 16 | 4.9 | 1.28 |
| NP_002637 | −144.7 | 186 | PPP[IVL] | LLLL | 32.6 | 31 | 40.8 | 1.27 |
| NP_000675 | 347.7 | 1831 | [ED]D[ED] | LLL | 4.2 | 10 | 1.2 | 1.25 |
| NP_001773 | −28.7 | 1520 | [ED]N[IVL] | LLL | 3.7 | 13 | 22 | 1.23 |
| NP_003170 | −24.3 | 1300 | [ED]N[IVL] | LLL | 4.2 | 21 | 31.6 | 1.21 |
| NP_006454 | −18.7 | 1727 | [GTSWYP]K[ED] | HLL | 1.6 | 17 | 6.5 | 1.19 |
| NP_000013 | −26.7 | 1523 | [GTSWYP]E[KR] | LLE | 1.8 | 19 | 10.2 | 1.17 |
| NP_057627 | −28 | 1540 | [GTSWYP]S[IVL] | LLL | 1.2 | 13 | 4.2 | 1.12 |
| NP_689901 | 219.3 | 1436 | [GTSWYP]E[KR] | LLE | 1.9 | 17 | 8.9 | 1.1 |
| NP_004030 | −34.3 | 1448 | [GTSWYP]S[IVL] | LLL | 1.3 | 9 | 1.3 | 1.09 |
| NP_955359 | 376 | 1292 | [ED]D[ED] | LLL | 6.3 | 10 | 7.9 | 1.06 |
| NP_542417 | −207.7 | −4169 | [GTSWYP]RP[IVL]P | LLLLL | 82.4 | 26 | 110.2 | 1.03 |
| NP_002342 | 591 | 538 | [KR]D[GTSWYP] | ELL | 5.2 | 6 | 3.4 | 0.77 |
Figure 3Three-dimensional structures of ERK2 (a) and MAPK14 (b) human proteins from PDB database (PDB files 2E14 and 1A9U, resp.). Critical residues and their positions within the two proteins are reported.