| Literature DB >> 19173748 |
Sung Hee Park1, José A Reyes, David R Gilbert, Ji Woong Kim, Sangsoo Kim.
Abstract
BACKGROUND: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19173748 PMCID: PMC2667511 DOI: 10.1186/1471-2105-10-36
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Data set of protein complexes
| Type Name | Type of Interaction | #. of Complexes | #. of Domains |
| ENZ | Enzyme-inhibitors | 25 | 49 |
| nonEnz | Non Enzyme-inhibitors | 21 | 47 |
| HET | Hetero-obligomers | 14 | 33 |
| HOM | Homo-obligomers | 87 | 225 |
EN Z: Enzyme-inhibitor interactions;
nonENZ: non-Enzyme-inhibitor interactions;
HET: Hetero-obligate interactions;
HOM: Homo-obligate interactions.
Average values of the properties
| Type | ASA(Å2) | HH | inPro | nAtom | nAA | nSSE | LCS | nFrag |
| ENZ | 860.42 | 0.40 | 0.596 | 121.73 | 33.71 | 11.22 | 3.3 | 12.32 |
| nonENZ | 823.06 | 0.37 | 0.530 | 106.89 | 29.59 | 12.91 | 2.5 | 12.91 |
| HET | 2237.92 | 0.41 | 0.982 | 344.26 | 82.56 | 21.35 | 3.5 | 21.35 |
| HOM | 1306.37 | 0.42 | 0.262 | 184.55 | 48.14 | 13.00 | 2.9 | 16.78 |
Figure 1Distribution of SSE content. The average distribution of SSE content is distinctive among different PPI types. More than 40% of atoms in interaction sites for all PPI types are positioned in non-regular regions. Interaction sites contain higher portion of non-regular regions than those of helix and strand regions. Especially, less than 20% of interaction sites are composed of strands.
The number of association rules discovered for each PPI type
| Type | #. of Domains | #. of Rules | Unique Rules | Overlapping Rules |
| ENZ | 49 | 65 | 34 (52.31%) | 31 (47.69%) |
| nonENZ | 47 | 49 | 16 (32.65%) | 33 (67.35%) |
| HET | 33 | 19 | 7 (36.84%) | 12 (63.16%) |
| HOM | 225 | 24 | 1 (4.17%) | 23 (95.83%) |
| Total | 354 | 157 | 58 (36.94%) | 99 (63.06%) |
#. of Domains: A number of domains in each PPI type;
#. of Rules: A number of association rules discovered for each PPI type;
Unique Rules: A number of association rules associated with just one PPI type;
Overlapping Rules: A number of rules of which bodies are identical to those of rules in other types.
Accuracy for difference classification methods
| DT | RF | KNN | SVM | NB | |
| 0.924 | 0.968 | 0.943 | 0.999 | 0.476 | |
| 0.926 | 0.971 | 0.978 | 0.999 | 0.531 | |
| 0.873 | 0.933 | 0.893 | 0.970 | 0.519 | |
| 0.917 | 0.951 | 0.936 | 0.992 | 0.451 | |
| 0.927 | 0.970 | 0.979 | 0.988 | 0.492 | |
| 0.800 | 0.850 | 0.800 | 0.890 | 0.483 |
Method represents different classification methods such as Decision Tree (DT), Random Forest (RF), K Nearest Neighbor(KNN), Support Vector Machine (SVM) and Naive Bayes (NB);
ARBC: Association rule based classification;
CW AR: Classification based on physicochemical properties;
UR: ARBC classification using 58 unique association rules;
: Data sets with exclusion of SSE content from All data1;
1All data: Data sets including SSE content;
2No SSE data: Data sets without inclusion of SSE content.
Analysis of SSE content rules over different subsets
| Subset | #. of rules | Fraction(%) | #. of SSE rules | ||
| 22 | 14.01% | 0.533 | - | - | |
| 31 | 19.75% | 0.642 | 13 | 0.661 | |
| 58 | 36.94% | 0.536 | 16 | 0.622 |
con f1: Average confidence of a rule subset;
con f2: Average confidence of SSE content rules in a rule subset;
SSE: Association rules encoding SSE content;
TOPK: Top K rules covering top 20% in confidence;
Unique: Unique rules.
Representative examples of association rules for each type
| # | O | Rule description | Type | Conf | Supp | C | G | K | U | S | I |
| 1 | 3 | If 77.31 ≤ Loop < 80.56 | ENZ | 0.811 | 0.032 | 1 | 0.214 | 1 | 1 | 1 | 0.722 |
| 2 | 8 | If 17.57 ≤ Helix < 20.87 | ENZ | 0.545 | 0.032 | 1 | 0.102 | 1 | 1 | 1 | 0.668 |
| 3 | 9 | If SCOPClass = 7 | ENZ | 0.725 | 0.053 | 1 | 0.184 | 1 | 1 | - | 0.660 |
| 4 | 26 | If 67.59 ≤ Loop < 70.83 | ENZ | 0.526 | 0.032 | - | 0.048 | 1 | 1 | 1 | 0.601 |
| 5 | 28 | If 461.83 ≤ df-ASA < 681.42 AND 2.3 ≤ LCS < 2.73 | ENZ | 0.625 | 0.032 | - | 0.120 | 1 | 1 | - | 0.555 |
| 6 | 37 | If 57.87 ≤ Loop < 61.11 | ENZ | 0.467 | 0.037 | - | 0.045 | - | 1 | 1 | 0.510 |
| 7 | 2 | If SCOPClass = 1 AND 12.25 ≤ nFrag < 16 AND NoStrand | nonENZ | 0.882 | 0.032 | 1 | 0.250 | 1 | 1 | 1 | 0.738 |
| 8 | 11 | If .66 ≤ inPro < .87 | nonENZ | 0.597 | 0.042 | 1 | 0.129 | 1 | 1 | - | 0.628 |
| 9 | 15 | If 26.74 ≤ nAA < 35.32 AND 901.01 ≤ df-ASA < 1120.6 | nonENZ | 0.556 | 0.032 | 1 | 0.133 | 1 | 1 | - | 0.620 |
| 10 | 18 | If SCOPClass = 1 AND 1.87 <= LCS < 2.3 9 | nonENZ | 0.545 | 0.032 | 1 | 0.137 | 1 | 1 | - | 0.619 |
| 11 | 20 | If 1.43 ≤ LCS < 1.87 | nonENZ | 0.556 | 0.042 | 1 | 0.074 | 1 | 1 | - | 0.612 |
| 12 | 21 | If NoStrand AND 1.87 ≤ LCS < 2.3 | nonENZ | 0.515 | 0.037 | - | 0.113 | 1 | 1 | 1 | 0.611 |
| 13 | 36 | If 58.11 ≤ ASAPR < 59.52 | nonENZ | 0.476 | 0.032 | 1 | 0.065 | - | 1 | - | 0.515 |
| 14 | 38 | If 41.67 ≤ Loop < 44.91 | nonENZ | 0.423 | 0.032 | - | 0.046 | - | 1 | 1 | 0.500 |
| 15 | 40 | If SCOPClass = 1 AND NoStrand | nonENZ | 0.484 | 0.064 | - | 0.074 | - | 1 | 0.406 | |
| 16 | 46 | If 125.14 ≤ nAtom < 165.52 AND 901.01 ≤ df-ASA < 1120.6 | nonENZ | 0.412 | 0.037 | - | 0.050 | - | 1 | - | 0.375 |
| 17 | 64 | If .42 ≤ HH < .44 | nonENZ | 0.347 | 0.037 | - | 0.009 | - | 1 | - | 0.348 |
| 18 | 5 | If 7.78 ≤ Strand < 10.27 | HET | 0.660 | 0.037 | 1 | 0.141 | 1 | 1 | 1 | 0.691 |
| 19 | 7 | If 2.8 ≤ Strand < 5.29 | HET | 0.565 | 0.037 | 1 | 0.089 | 1 | 1 | 1 | 0.670 |
| 20 | 12 | If 205.9 ≤ nAtom < 246.28 | HET | 0.574 | 0.037 | 1 | 0.143 | 1 | 1 | - | 0.626 |
| 21 | 25 | If 44.91 ≤ Loop < 48.15 | HET | 0.479 | 0.037 | 1 | 0.110 | - | 1 | 1 | 0.604 |
| 22 | 32 | If 3.6 ≤ LCS < 4.03 | HET | 0.461 | 0.037 | 1 | 0.100 | - | 1 | - | 0.520 |
| 23 | 33 | If .44 ≤ HH < .46 | HET | 0.467 | 0.045 | 1 | 0.070 | - | 1 | - | 0.516 |
| 24 | 63 | If SCOPClass = 1 AND NoStrand | HET | 0.282 | 0.037 | - | 0.074 | - | - | 1 | 0.348 |
| 25 | 31 | If SCOPClass = 3 AND 2.3 ≤ LCS < 2.73 | HOM | 0.470 | 0.033 | 1 | 0.100 | - | 1 | - | 0.521 |
| 26 | 98 | If 3.17 ≤ LCS < 3.6 | HOM | 0.337 | 0.035 | - | 0.034 | - | - | - | 0.135 |
| 27 | 133 | If 26.74 ≤ nAA < 35.32 | HOM | 0.237 | 0.039 | - | 0.041 | - | - | - | 0.106 |
Representative examples of 27 rules within top 30% are listed by sorting Columns Type and I. Rules of which order is below 48 are added for explaining overlapping rules and the comparison to rules produced from a decision tree.
#: Rule identifier;
O: Order of a rule ranking by importance factor;
Rule description: The body of a rule;
Type: The head of a rule representing a PPI type;
Conf: Confidence of a rule;
Supp: Support of a rule;
C: Rules selected from correlation-based feature subset selection [32];
G: The worth of a rule by measuring the gain ratio [33]with respect to PPI types;
K: Top K rules ranked within top 30%;
U: Unique rules;
S: SSE content rules;
I: Importance factor of a rule calculated by an average of all factors such as Conf, Supp, C, G, K, U and S; "-" is replaced with value 0 when the importance factor was calculated.
Representative examples of ENZ type presenting different structural features
| # | O | Rule description | Subtype | Conf | Supp | C | G | K | U | S | I |
| 28 | 24 | If NoHelix | ENZ_A, ENZ_B, ENZ_C | 0.508 | 0.069 | - | 0.058 | 1 | 1 | 1 | 0.606 |
| 29 | 1 | If SCOPClass = 7 AND NoHelix | ENZ_A, ENZ_B | 1.000 | 0.032 | 1 | 0.315 | 1 | 1 | 1 | 0.764 |
| 30 | 17 | If 461.83 ≤ df-ASA < 681.42 AND NoHelix | ENZ_A, ENZ_B | 0.593 | 0.037 | - | 0.085 | 1 | 1 | 1 | 0.619 |
| 31 | 39 | If 461.83 ≤ df-ASA < 681.42 | ENZ_A, ENZ_B | 0.477 | 0.111 | 1 | 0.076 | - | - | - | 0.416 |
| 32 | 16 | If NoHelix AND nFrag < 4.75 | ENZ_A | 0.612 | 0.032 | - | 0.076 | 1 | 1 | 1 | 0.620 |
| 33 | 19 | If 4.75 ≤ nSSE < 6.62 AND NoHelix | ENZ_A | 0.588 | 0.032 | - | 0.072 | 1 | 1 | 1 | 0.538 |
| 34 | 51 | If 461.83 ≤ df-ASA < 681.42 AND 4.75 ≤ nSSE < 6.62 | ENZ_A | 0.417 | 0.032 | - | 0.018 | - | 1 | - | 0.367 |
| 35 | 77 | If 44.38 ≤ nAtom < 84.76 AND 461.83 ≤ df-ASA < 681.42 | ENZ_A | 0.396 | 0.058 | - | 0.023 | - | - | - | 0.159 |
| 36 | 34 | If 9.58 ≤ nAA < 18.16 AND 44.38 ≤ nAtom < 84.76 AND 461.83 ≤ df-ASA < 681.42 | ENZ_A | 0.500 | 0.032 | - | 0.045 | 1 | 1 | - | 0.515 |
| 37 | 60 | If 18.16 ≤ nAA < 26.74 AND 44.38 ≤ nAtom < 84.76 | ENZ_A | 0.357 | 0.032 | - | 0.015 | - | 1 | - | 0.351 |
| 38 | 10 | If 84.76 ≤ nAtom < 125.14 AND 461.83 ≤ df-ASA <681.42 | ENZ_B | 0.617 | 0.053 | 1 | 0.145 | 1 | 1 | - | 0.636 |
| 39 | 13 | If 12.66 ≤ sRatio < 15.06 AND 461.83 ≤ df-ASA < 681.42 | ENZ_B | 0.600 | 0.032 | 1 | 0.113 | 1 | 1 | - | 0.624 |
| 40 | 14 | If 461.83 ≤ df-ASA < 681.42 AND 10.38 ≤ nSSE < 12.25 AND SCOPClass = 2 | ENZ_B | 0.857 | 0.032 | - | 0.230 | 1 | 1 | - | 0.624 |
| 41 | 27 | If SCOPClass = 2 AND 461.83 ≤ df-ASA < 681.42 AND 84.76 ≤ nAtom < 125.14 | ENZ_B | 0.789 | 0.032 | - | 0.176 | 1 | 1 | - | 0.599 |
| 42 | 35 | If 10.38 ≤ nSSE < 12.25 AND 12.25 ≤ nFrag < 16 | ENZ_B | 0.500 | 0.032 | - | 0.043 | 1 | 1 | - | 0.515 |
| 43 | 73 | If 84.76 ≤ nAtom < 125.14 AND SCOPClass = 2 | ENZ_B | 0.408 | 0.042 | - | 0.043 | - | - | - | 0.164 |
| 44 | 114 | If 84.76 ≤ nAtom < 125.14 AND 26.74 ≤ nAA < 35.32 | ENZ_B | 0.307 | 0.037 | - | 0.024 | - | - | - | 0.123 |
| 45 | 109 | If 681.42 ≤ df-ASA < 901.01 | ENZ_C | 0.317 | 0.048 | - | 0.013 | - | - | - | 0.126 |
| 46 | 137 | If 84.76 ≤ nAtom < 125.14 AND 681.42 ≤ df-ASA < 901.01 | ENZ_C | 0.252 | 0.032 | - | 0.009 | - | - | - | 0.098 |
| 47 | 146 | If SCOPClass = 4 | ENZ_C | 0.221 | 0.042 | - | 0.011 | - | - | - | 0.091 |
| 48 | 101 | If 35.32 901.01 nAA < 43.9 AND 125.14 ≤ nAtom < 165.52 | ENZ_D | 0.323 | 0.032 | - | 0.041 | - | - | - | 0.132 |
| 49 | 130 | If SCOPClass = 3 | ENZ_D | 0.238 | 0.069 | - | 0.016 | - | - | - | 0.108 |
| 50 | 141 | If 901.01 ≤ df-ASA < 1120.6 | ENZ_D | 0.207 | 0.032 | - | 0.050 | - | - | - | 0.096 |
| 51 | 54 | If 1120.6 ≤ df-ASA < 1340.19 | ENZ_E | 0.392 | 0.042 | - | 0.018 | - | 1 | - | 0.363 |
Abbreviation of column names is the same as that of Table 6.
The ENZ subtypes are defined in Figure 4. Note that ENZ_B includes both inhibitors and enzymes while the others are exclusively formed by inhibitors (e.g. ENZ_A, ENZ_C and ENZ_E) or enzymes (e.g. ENZ_D).
Figure 2A scatter Plot matrix for PPI types and association rules. This scatter plot matrix shows clusters as collection of points separated by association rules encoding SSE content information or a SCOP class. Different colors of the left in each plot (a cell) correspond to four PPI types. The right of a plot area presents the distribution of points met with a rule on the head of a cell. Rules 29, 40, 1, and 3 separate ENZ and nonENZ from other types remarkably with few errors. The Rule 29 is a strong discriminator to classify ENZ from other types completely.
Figure 32D plots for pairs of association rules. These plot data points by pairs of association rules. X and Y axes are a pair of rules and each of them have two boolean values. 0 represents negative data points not meeting with a rule of each axis and 1 represents for positive data points meeting with the rule. The data points on the upper left corner meet a rule used for Y axis and the data points on the down right corner meet a rule used for X axis. The points on the upper right corner meet with both rules used for X and Y axes. Plots in Figure 3(a), (b), and (c) characterize distribution of inhibitors in enzyme-inhibitors interactions. Rule 28 is used for X axis in plots (a), (b) and (c). Rules 1, 3 and 38 are used for the Y axis in those plots. (a) represents an example for a pair of rules both including SSE information (e.g. helix and loop content). (b) and (c) show examples for combination of SSE content information (Rule 28: "Nohelix ") with other properties (e.g. SCOPClass, number of atoms and etc.). Plot (b) (Rule 3 versus Rule 28) is identical to the plot generated by Rule 29. Enzymes interacting with a group of inhibitors characterized by (a), (b), and (c) are featured by in Figure 3(e), and (f). Enzymes and inhibitors described by Rules 40 and 29 respectively are plotted in (d) where there is no point matching with both rules. Plot (d) reflects proper interpretation of association rules regarding interactions between enzymes and inhibitors.
Figure 4A hierarchical tree for supporting inference of subtypes. A hierarchical tree drawn from association rules (Table 7) represents different structural groups in ENZ. Enzyme-inhibitor interactions are characterized with size scales of interaction sites (number of atoms and df-ASA) and SSE content information (helix content). These differences of structural groups result in subtypes of PPIs. Letters in red are identifiers of rules (Tables 6 and 7) to split branches of a tree. Dashed lines show interaction between enzymes and inhibitors in different subtypes.
PART rules generated by decision trees using C4.5
| # | Rules discovered by C4.5 Decision Tree | Type | Conf | Supp | Corresponding rules |
| 5 | AVGASA > 68.73025 AND nAtom > 60 AND LCS > 2.61 AND Strand ≤ 32.857 AND SCOPClass = 7 | ENZ | 1 | 0.03 | 35, 5, 3, 36 |
| 38 | sRatio ≤ 29.411765 AND HH > 0.277096 AND SCOPClass = 2 AND Strand > 16.949 AND Strand > 21.324 AND nSSE > 10 | ENZ | 1 | 0.02 | 40, 39 |
| 4 | Loop > 50.299 AND nAtom > 60 AND Helix ≤ 33.636 AND AVGASA ≤ 41.137133 | ENZ | 0.99 | 0.07 | 35, 6 |
| 27 | inPro ≤ 2.016077 AND Helix > 48.485 AND LCS > 1.727 AND Strand ≤ 8.571 AND SCOPClass = 1 AND AVGASA ≤ 53.133 | nonENZ | 1 | 0.02 | 8, 10 |
| 40 | SCOPClass = 1 AND Strand ≤ 2.26 | nonENZ | 1 | 0.01 | 15 |
| 1 | nAtom > 189 AND Loop ≤ 66.316 AND nSSE > 13 AND Helix ≤ 19.481 AND sRatio ≤ 80.833 AND inPro > -1.570 AND LCS > 3.714 AND Loop ≤ 46.7 | HET | 1 | 0.05 | 20, 21 |
| 3 | nAtom > 212 AND Strand ≤ 10.738 AND nSSE > 13 AND inPro > -1.476973 AND nAtom > 384 | HET | 1 | 0.05 | 20, 18, 19 |
| 34 | SCOPClass = 3 AND Helix > 18.421 | HOM | 1 | 0.02 | 25 |
| 15 | HH > 0.433 AND AVGASA > 55.984 AND nAA ≤ 34 | HOM | 1 | 0.01 | 27 |
: A total of 44 rules produced by a decision tree using C4.5 algorithm in WEKA machine learning library;
#: PART rule identifier;
Corresponding rules: Association rule identifiers (Tables 6, 7 and 8) corresponding to a PART rule.
Representative examples of overlapping association rules
| # | # | Rule description | Types | Conf | Supp | Conf | Supp |
| 52 | 43 | If 84.76 ≤ nAtom < 125.14 AND SCOPClass = 2 | ENZ1 OR nonENZ2 | 0.408 | 0.042 | 0.306 | 0.032 |
| 53 | 35 | If 44.38 ≤ nAtom < 84.76 AND 461.83 ≤ df-ASA < 681.42 | ENZ1 OR nonENZ2 | 0.396 | 0.058 | 0.252 | 0.037 |
| 54 | 48 | If 35.32 ≤ nAA < 43.9 AND 125.14 ≤ nAtom < 165.52 | ENZ1 OR nonENZ2 | 0.323 | 0.032 | 0.376 | 0.037 |
| 55 | 46 | If 84.76 ≤ nAtom < 125.14 AND 681.42 ≤ df-ASA < 901.01 | ENZ1 OR nonENZ2 | 0.252 | 0.032 | 0.336 | 0.042 |
| 56 | 26 | If 3.17 ≤ LCS < 3.6 | HET1 OR HOM2 | 0.357 | 0.037 | 0.337 | 0.035 |
Examples of overlapping rule are selected from Tables 6 and 7.
# Rule identifier;
#: Rule identifier in Tables 6 and 7;
Rule description: The body of overlapping rules between the two types;
Types: PPI Type1 and Type2 having overlapping rules in common;
Con f: Confidences of overlapping rules for Type1 and Type2 respectively;
Supp: Supports of overlapping rules for Type1 and Type2 respectively.