| Literature DB >> 16398927 |
Pall F Jonsson1, Tamara Cavanna, Daniel Zicha, Paul A Bates.
Abstract
BACKGROUND: Protein-protein interactions have traditionally been studied on a small scale, using classical biochemical methods to investigate the proteins of interest. More recently large-scale methods, such as two-hybrid screens, have been utilised to survey extensive portions of genomes. Current high-throughput approaches have a relatively high rate of errors, whereas in-depth biochemical studies are too expensive and time-consuming to be practical for extensive studies. As a result, there are gaps in our knowledge of many key biological networks, for which computational approaches are particularly suitable.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16398927 PMCID: PMC1363365 DOI: 10.1186/1471-2105-7-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Inferring interactions by homology. Each interaction is inferred from homology to experimentally observed interactions. In this schematic, proteins aand bhave been shown experimentally to interact in one organism, here labelled 'species X', and protein aand bin another, 'species Y'. Lists of homologues are generated for each of the proteins, ranked by their bit score (, , etc.). A protein from one list may interact with a protein from the other (shown by the red arrow) and potential pairwise interactions are scored according to Equation 1, based on homology to the proteins involved in the known interaction. Furthermore, interactions receive a higher score if they are derived from multiple experimental sources (n > 1). The score is additive, for instance, in the example here, the blue and green sequences are predicted to interact based on the interactions in 'species X' and 'species Y' and the overall score is the sum of both pairwise scores. This additive process continues over all experimentally determined protein pairs, N, (e.g. through 'species Z'), for which the rat sequences, labelled blue and green, are present.
Figure 2The distribution of bit scores as a function of sequence identity. The sequence identity and bit score of each hit when proteins in the interaction data were queried against the rat genome. The solid red line shows the best linear fit to the data and shown in dotted red is a line, starting at the origin, which contains 97% of the data in the area below it. Reading from these lines at 30% sequence identity gives bitscores of 86 and 177, respectively, yielding interaction scores of 9 and 10 when inserted into Equation 1. To ensure a stringent criteria for the minimum interaction score the higher value was selected as a cutoff score.
Distribution of protein-protein interaction scores. Interaction scores of X-ray crystal structures (n = 377) compared to the scores of all (genome-wide) predicted interactions.
| Percentage of interactions | ||
| Interaction score 0 – 10 | Interaction score > 10 | |
| X-ray crystal structures | 6.4 | 93.6 |
| Genome-wide | 43.2 | 56.8 |
Figure 3Identifying protein communities by cluster analysis. The communities identified by k-clique analysis performed on the predicted genome-wide rat protein network. The communities are distinguished by different colours and labelled by the overall function or the dominating protein class. Note that proteins, particularly at community edges, can belong to more than two communities, although this is not shown. A complete list of protein names is included as supplementary material [see Additional file 1]. The graph was created by Graphviz [61].
Domain frequency within the clustered communities. The table shows the most frequently observed domains in the metastasis-related cluster communities (observed frequencies) alongside the expected domain frequencies, based on the domain composition of the whole rat genome. The n-fold difference was calculated from the frequency percentages (numbers within parentheses).
| Domain | Observed frequency (%) | Expected frequency (%) | |
| Spectrin repeat | 56 (6.9) | 6 (0.7) | 8.3 |
| IQ calmodulin-binding motif | 54 (6.6) | 2 (0.2) | 26.5 |
| EGF-like domain | 52 (6.4) | 16 (2.0) | 2.2 |
| Protein kinase domain | 47 (5.8) | 12 (1.4) | 3.0 |
| SH2 domain | 27 (3.3) | 2 (0.3) | 11.7 |
| EF hand | 25 (3.1) | 7 (0.8) | 2.6 |
| Immunoglobulin domain | 21 (2.6) | 35 (4.3) | -0.4 |
| SH3 domain | 20 (2.4) | 6 (0.7) | 2.6 |
| Calponin homology (CH) domain | 13 (1.6) | 2 (0.3) | 5.4 |
| Proteasome A-type and B-type | 12 (1.5) | 1 (0.1) | 20.0 |
| LIM domain | 11 (1.3) | 3 (0.4) | 2.7 |
| Transforming growth factor | 10 (1.2) | 1 (0.1) | 11.2 |
Figure 4A closer view of a part of the 'intracellular signalling cascade'. The figure shows a subsection of the network around the intracellular signalling cascade where it extends to the VEGFs and JAK/STAT protein communities. The confidence of the interactions is shown by colour coding based on the interaction scores ranging from low-scoring blue (10 ≤ s < 10.5) to high-scoring red (s > 40.0). The metastatic cell line expression levels are also shown; blue for down-regulated genes and red for up-regulated ones.
The connectivity of up- and down-regulated proteins. Observed and expected frequencies of pairwise protein interactions, categorised by their expression: N-N (non-expressed protein interacting with non-expressed protein), U-U (up-regulated protein interacting with up-regulated protein), D-D (down-regulated protein interacting with down-regulated protein) and U-D (up-regulated interacting with down-regulated). For the purpose of the classification, up-regulated proteins are those up-regulated more than 20% and down-regulated proteins down-regulated more than 20%. Expected values were calculated based on a random distribution of the expression data on the network (p < 0.001 for a χ2-test).
| Observed | Expected | ||
| N-N | 8 | 5 | 1.5 |
| U-U | 121 | 109 | 1.1 |
| D-D | 17 | 41 | 0.4 |
| U-D | 71 | 67 | 1.1 |
Experimental sources for building the interactome. Summary of the experiments used as a foundation for building the interactome, from most frequent (top) to least frequent (bottom). The percentage of the total is listed after each value.
| Method | Frequency (%) |
| Two hybrid test | 35,759 (69.9) |
| Immunoprecipitation | 6,290 (12.3) |
| Tandem Affinity Purification (TAP) | 3,503 (6.85) |
| Affinity chromatography | 1,070 (2.09) |
| Copurification | 572 (1.12) |
| Cross-linking | 518 (1.01) |
| X-ray crystallography | 511 (1.00) |
| In vitro binding | 452 (0.88) |
| Biochemical/biophysical | 327 (0.64) |
| Gel filtration chromatography | 326 (0.64) |
| In vivo kinase activity assay | 185 (0.36) |
| Competition binding | 185 (0.36) |
| Immunoblotting | 140 (0.27) |
| Cosedimentation | 133 (0.26) |
| Gel retardation assays | 106 (0.21) |
| Native gel electrophoresis | 103 (0.20) |
| Other | 973 (1.90) |
Gene ontology cellular compartments. A simplified representation of gene ontology cellular compartments. Protein accessibility between compartments is represented by ones and zeros: the former indicates the possibility of interaction between respective compartments and the latter excludes any interactions.
| Extracellular | Intracellular | Cytoplasm | Nucleus | Mitochondrion | Membrane | |
| Extracellular | 1 | 0 | 0 | 0 | 0 | 1 |
| Intracellular | 0 | 1 | 1 | 1 | 1 | 1 |
| Cytoplasm | 0 | 1 | 1 | 0 | 0 | 1 |
| Nucleus | 0 | 1 | 0 | 1 | 0 | 1 |
| Mitochondrion | 0 | 1 | 0 | 0 | 1 | 0 |
| Membrane | 1 | 1 | 1 | 1 | 0 | 1 |
The number of protein communities at different clustering threshold values. The number of protein communities vary as the k-value for clustering is changed. The table shows the total number of separate protein communities for each k-value.
| Clustering threshold value | Number of protein communities |
| 145 | |
| 37 | |
| 12 | |
| 8 | |
| 2 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |