| Literature DB >> 17880677 |
Vera van Noort1, Berend Snel, Martijn A Huynen.
Abstract
BACKGROUND: In the post-genomic era various functional genomics, proteomics and computational techniques have been developed to elucidate the protein interaction network. While some of these techniques are specific for a certain type of interaction, most predict a mixture of interactions. Qualitative labels are essential for the molecular biologist to experimentally verify predicted interactions.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17880677 PMCID: PMC2375035 DOI: 10.1186/gb-2007-8-9-r197
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Two categories of functional relationships. The two categories of functional relationships that we use in this study are physical interactions (specifically co-complex memberships) and metabolic interactions. The physical interactions (red) exist between proteins that are identified in the same protein complex, whereas the metabolic interactions (blue) exist between proteins that act in the same metabolic pathway. Metabolic interactions may also exist between individual members of a complex and proteins that act in the same pathway as this complex. One aspect of the nature of these to functional relationships is the physical distance between proteins as illustrated in the graph. As the nature of the relationships differs, one might expect differential behavior in high-throughput experiments.
Figure 2Score-ppv plots of individual datasets. On the x-axis is the score for that dataset, on the y-axis the ppv. The ppv was calculated in all score intervals with bin-width 0.025. Red lines indicate ppv on the protein complex reference set, being the number of true positives in the complex reference set divided by the number of true positives and false positives in both reference sets. Blue lines indicate the ppv on the metabolic reference set, being the number of true positives in the metabolic reference set divided by the number of true positives and false positives in both reference sets. (a) Correlated mRNA expression (CoExp). (b) Shared binding of transcription factors (ChIP-chip). (c) Co-regulation (ChIP-chip*CoExp). (d) Conserved co-expression between four species (CoExp4Sp). (e) Conserved co-expression between two species (CoExp2Sp). (f) Paralogous conserved co-expression (CoExpPar). (g) Gene neighborhood conservation (GenNeigh). (h) Correlated phylogenetic profiles (CoOcc). (i) Shared genetic interactions (GenInt). (j) Yeast-two-hybrid (Y2H). (k) TAP-tag purifications (Gavin et al. [3]). (l) TAP-tag purifications (Krogan et al. [4]). For (k, l) the protein pairs that are never co-purified and thus have a SA score of 0 are in bin 0.2.
Logistic regression coefficients with metabolic and physical interactions
| Input | Intercepts | Coefficients | R2 value |
| Metabolic | -5.01* | 2.44* | 0.00766 |
| Physical | -8.82* | 7.25* | 0.0588 |
| Metabolic | -3.37* | 0.568† | 0.000603 |
| Physical | -4.95* | 2.11* | 0.00666 |
| Metabolic | -4.32* | 5.02* | 0.0246 |
| Physical | -6.39* | 9.17* | 0.0745 |
| Metabolic | -2.32* | 2.36† | 0.00706 |
| Physical | -2.97* | 6.33* | 0.0546 |
| Metabolic | -4.02* | 2.55* | 0.00594 |
| Physical | -8.16* | 10.5* | 0.103 |
| Metabolic | -2.62* | -2.07* | 0.00484 |
| Physical | -7.48* | 6.17* | 0.0373 |
| Metabolic | -2.69* | 2.96* | 0.0280 |
| Physical | -5.65* | 6.20* | 0.219 |
| Metabolic | -1.71* | 1.49* | 0.0223 |
| Physical | -3.69* | 3.27* | 0.120 |
| Metabolic | -4.18* | 4.06* | 0.0120 |
| Physical | -3.16* | 11.3* | 0.113 |
| Metabolic | -3.35* | -3.89* | 0.106 |
| Physical | -2.30* | 4.29* | 0.119 |
| Metabolic | -3.85* | -0.153 | 1.87e-06 |
| Physical | -10.3* | 24.5* | 0.298 |
| Metabolic | -3.68* | 0.0350 | 9.19e-08 |
| Physical | -8.99* | 18.37* | 0.146 |
| Metabolic | -3.78* | -0.54 | 1.77e-05 |
| Physical | -12.3* | 32.7* | 0.322 |
The scores of the 'omics' datasets were in turn considered as the continuous independent variable to fit a logit function to the presence/absence of interactions. *P < 2e-16; †P < 0.0001. CoExp, correlated mRNA expression; ChIP-chip, shared binding of transcription factors; ChIP-chip*CoExp, co-regulation; CoExp4Sp, conserved co-expression between four species; CoExp2Sp, conserved co-expression between two species; CoExpPar, paralogous conserved co-expression; GenNeigh, CoOcc, correlated phylogenetic profiles; GenInt, shared genetic interactions; Y2H, yeast-two-hybrid; TAP-tag Gavin, TAP-tag purifications (Gavin et al. [3]); TAP-tag Krogan, TAP-tag purifications (Krogan et al. [4]); TAP-tag (G+K), the sum of SA scores derived from the two TAP-tag purification data sets.
Figure 3Differential ppv in the evidence landscape. In each panel the x-axis indicates the score in the first dataset, the y-axis the score in the second set. The color scheme is based on differential ppv, being the ppv on the metabolic reference set minus the ppv on the physical interaction reference set. Differential ppv 1 is dark blue, 0 is yellow and -1 is red, parts that contain no gene pairs are white. The blue parts of the landscapes are regions where there are only metabolic interactions, whereas in the red parts there are only physical interactions. (a) TAP-tag purifications (Krogan) versus TAP-tag purifications (Gavin). (b) TAP-tag purifications (sum Krogan Gavin) versus conserved co-expression (CoExp2Sp). (c) TAP-tag purifications (sum Krogan Gavin) versus co-regulation (ChIP-chip*CoExp). (d) TAP-tag purifications (Krogan) versus gene neighborhood conservation (GenNeigh). (e) Gene neighborhood conservation (GenNeigh) versus co-regulation (ChIP-chip*CoExp). (f) TAP-tag purifications (Gavin) versus co-regulation (ChIP-chip*CoExp). (g) Correlated phylogenetic profiles (CoOcc) versus gene neighborhood conservation (GenNeigh).(h) Gene neighborhood conservation (GenNeigh) versus conserved co-expression (CoExp2Sp). (i) Paralogous conserved co-expression (CoExpPar) versus conserved co-expression (CoExp4Sp).
Figure 4Network with qualitative labels on predicted interactions. (a) The network of interactions in yeast that are specifically predicted to be physical (red lines) or metabolic (blue lines). We took all gene pairs that fell into squares (Figure 3) with a differential ppv larger than 0.95 and at least five true positive metabolic interactions for the specific metabolic interactions. We selected all gene pairs that fell into squares with differential ppv smaller than -0.95 and at least five true positive physical interactions for the specific physical interactions. Names of several known complexes and metabolic pathways are indicated on the network. (b) The arginine biosynthesis pathway in yeast. Names of the enzymes are in orange, arrows indicate biochemical reactions. Blue lines indicate all interactions that exist for these genes. Note that ECM40 catalyzes two steps in this pathway but the interactions with the other genes are drawn only once.
True positives and false positives
| Positive metabolic | Negative metabolic | Positive physical | Negative physical | |
| Present in bin | TP | FP | TP | FP |