| Literature DB >> 15090078 |
Lan V Zhang1, Sharyl L Wong, Oliver D King, Frederick P Roth.
Abstract
BACKGROUND: Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15090078 PMCID: PMC419405 DOI: 10.1186/1471-2105-5-38
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Categories of gene- and protein-pair attributes used
| I. | High-throughput screens (HTS) of interactions | 11 | [2, 3, 6, 7] |
| X. | Correlated mRNA expression | 23 | [42, 43] |
| R. | Same transcriptional regulator | 229 | [27] |
| L. | Same subcellular localization (high-throughput) | 16 | [44] |
| P. | Same knockout phenotype | 181 | [25] |
| H. | Sequence homology | 4 | [45] |
| U. | Gene fusion | 1 | [5] |
| N. | Gene neighborhood | 1 | [5] |
| O. | Gene co-occurrence in phylogenetic profiles | 1 | [5] |
Additional categories of gene- and protein-pair attributes
| S. | Same subcellular localization (MIPS) | 43 | [25] |
| F. | Same function (MIPS) | 258 | [25] |
| C. | Same protein class (MIPS) | 191 | [25] |
Figure 1Decision tree constructed using all protein pairs. Each leaf node is labeled with the numbers of CCPs and non-CCPs associated with it, while each internal node is labeled with the attribute (j) used for subsequent partitioning (see Table 4 or Supplementary Information for descriptions of the attributes). Two edges originate from each internal node, labeled "+" or "-," corresponding to the daughter nodes that have or do not have attribute j, respectively. Nodes with percentages of CCPs higher than that of the root node are colored red, while those with lower CCP percentages are blue. The color saturation depends on the relative entropy compared with the root node. The arrowhead size of an edge from a given node approximately represents the fraction of protein pairs in the parent node assigned to the corresponding daughter node.
Top 20 attributes ranked by reduction in entropy provided by partitioning the root node
| R_p001.FHL1 | 9.5e-4 | 25.7% | Bound by Fhl1p, p < 0.001 |
| R_p005.FHL1 | 9.3e-4 | 25.3% | Bound by Fhl1p, p < 0.005 |
| X_cc.p.8 | 7.6e-4 | 20.7% | Correlated mRNA expression, cell cycle dataset, cc > 0.8 |
| X_cc.p.7 | 7.4e-4 | 20.0% | Correlated mRNA expression, cell cycle dataset, cc > 0.7 |
| X_cc.p.6 | 6.0e-4 | 16.2% | Correlated mRNA expression, cell cycle dataset, cc > 0.6 |
| R_p001 | 6.0e-4 | 15.9% | Same transcriptional regulator, p < 0.001 |
| R_p005.RAP1 | 5.0e-4 | 13.6% | Bound by Rap1p, p < 0.005 |
| X_cc | 5.0e-4 | 13.4% | Correlated mRNA expression, cell cycle dataset |
| X | 5.0e-4 | 13.4% | Correlated mRNA expression |
| R_p005 | 4.3e-4 | 11.6% | Same transcriptional regulator, p < 0.005 |
| I_APMS.TAP | 3.0e-4 | 8.2% | TAP |
| R_p001.RAP1 | 3.0e-4 | 8.2% | Bound by Rap1p, p < 0.001 |
| I_APMS | 2.7e-4 | 7.3% | APMS |
| I | 2.7e-4 | 7.3% | High-throughput screens (HTS) of interactions |
| I_APMS.TAP.spoke | 1.5e-4 | 4.1% | TAP, "spoke" model |
| X_cc.p.9 | 1.4e-4 | 3.7% | Correlated mRNA expression, cell cycle dataset, cc > 0.9 |
| X_Rst.p.6 | 1.2e-4 | 3.3% | Correlated mRNA expression, Rosetta compendium, cc > 0.6 |
| N | 1.2e-4 | 3.2% | Gene neighborhood |
| X_Rst | 1.1e-4 | 2.8% | Correlated mRNA expression, Rosetta compendium |
| I_APMS.HMS-PCI | 7.3e-5 | 2.0% | HMS-PCI |
Attributes used in the decision tree
| I | High-throughput screens (HTS) of interaction |
| I_APMS.TAP | Tandem-affinity purification (TAP) |
| I_APMS.TAP.spoke | Tandem-affinity purification (TAP), "spoke" model |
| I_APMS.HMS-PCI | High-throughput mass spectrometric protein complex identification (HMS-PCI) |
| I_APMS.HMS-PCI.spoke | High-throughput mass spectrometric protein complex identification (HMS-PCI), "spoke" model |
| I_Y2H | Yeast two-hybrid (Y2H) |
| I_Y2H.Uetz | Yeast two-hybrid (Y2H), Uetz |
| X | Correlated mRNA expression |
| X_Rst | Correlated mRNA expression, Rosetta compendium |
| X_Rst.p | Positively correlated mRNA expression, Rosetta compendium |
| X_Rst.p.8 | Correlated mRNA expression, Rosetta compendium, cc > 0.8 |
| X_cc.p | Positively correlated mRNA expression, cell cycle dataset |
| X_cc.p.7 | Correlated mRNA expression, cell cycle dataset, cc > 0.7 |
| X_cc.p.8 | Correlated mRNA expression, cell cycle dataset, cc > 0.8 |
| X_cc.p.9 | Correlated mRNA expression, cell cycle dataset, cc > 0.9 |
| R | Same transcriptional regulator |
| R_p005.ABF1 | Bound by Abf1p, p < 0.005 |
| R_p005.GRF10 | Bound by Grf10p, p < 0.005 |
| R_p005.HAP4 | Bound by Hap4p, p < 0.005 |
| R_p005.RAP1 | Bound by Rap1p, p < 0.005 |
| R_p005.RME1 | Bound by Rme1p, p < 0.005 |
| R_p005.SFP1 | Bound by Sfp1p, p < 0.005 |
| R_p005.SWI4 | Bound by Swi4p, p < 0.005 |
| R_p005.YAP5 | Bound by Yap5p, p < 0.005 |
| R_p001.FHL1 | Bound by Fhl1p, p < 0.001 |
| R_p001.HAP4 | Bound by Hap4p, p < 0.001 |
| R_p001.HIR2 | Bound by Hir2p, p < 0.001 |
| R_p001.RAP1 | Bound by Rap1p, p < 0.001 |
| R_p001.REB1 | Bound by Reb1p, p < 0.001 |
| L | Same subcellular localization (high-throughput) |
| L_05 | ER |
| L_08 | Mitochondrial |
| L_10 | Nucleus |
| L_04 | Cytoplasm |
| P | Same Phenotype |
| P_1 | Conditional phenotypes |
| P_1.1 | Heat-sensitivity |
| P_1.3 | Slow-growth |
| P_2 | Cell cycle defects |
| P_2.4 | Other cell cycle defects |
| P_4.2 | Methionine auxotrophy |
| P_4.5.4 | Respiratory deficiency |
| P_5 | Cell morphology and organelle mutants |
| P_5.2.5 | Other budding mutants |
| P_5.3 | Cell wall mutants |
| P_5.6.1 | Tubulin cytoskeleton mutants |
| P_5.6.1.5 | Other tubulin cytoskeleton mutants |
| P_5.6.2 | Actin cytoskeleton mutants |
| P_5.9 | Secretory mutants |
| P_5.11 | Mitochondrial mutants |
| P_5.13.2 | Other vacuolar mutants |
| P_5.14 | Other cell morphology mutants |
| P_8 | Nucleic acid metabolism defects |
| P_8.1 | DNA repair mutants |
| P_8.1.1 | UV light sensitivity |
| P_8.2 | DNA replication mutants |
| P_9.9 | Staurosporine sensitivity |
| H | Sequence homology, E < e-6 |
| H.e-12 | Sequence homology, E < e-12 |
| N | Gene neighborhood |
| O | Gene co-occurrence |
Figure 2ROC curves for predictions based on: all attributes (black), all attributes except the category "high-throughput screens of interaction" (yellow), all attributes except the category "correlated mRNA expression" (green), all attributes except the category "same transcriptional regulator" (red), all attributes except the category "sequence homology" (blue) and all attributes together with the categories "same subcellular localization (MIPS)", "same function (MIPS)" and "same protein class (MIPS)" (grey). The expected ROC curve for random guesses is the diagonal where true-positive rate equals false-positive rate (black dotted line). A-C show the same ROC curve at different resolutions.
Figure 3A: Decision tree predictions compared with four high-throughput datasets and their simple combinations. B and C: Decision tree predictions compared with two APMS studies: TAP (B) and HMS-PCI (C), respectively. Only protein pairs covered by each respective study (using the "spoke" model [30]) were considered. Black solid line: decision tree predictions using all attributes; blue solid line: decision tree predictions using only high-throughput interaction datasets; grey solid line: decision tree predictions using all attributes together with the categories "same function" and "same protein class"; black dotted line: expected performance of random guesses.
Top predictions not annotated as CCPs in the reference set. The 50 top-scoring protein pairs not annotated in our reference set (so-called "false positives") with results of a further search for pre-existing evidence of CCP. 15 of them are shown to be true CCPs according to YPD.
| 1 | Rpl40Bp | Rps31p | 0.943 | |
| 2 | Rps31p | Rpl40Ap | 0.938 | |
| 3 | Smc1p | Smc3p | 0.864 | Cohesin |
| 4 | Gpt2p | Sec28p | 0.857 | |
| 5 | Pwp2p | Utp13p | 0.844 | Small subunit processome |
| 5 | Sgn1p | Pub1p | 0.844 | |
| 7 | Rdh54p | Rad5p | 0.833 | |
| 7 | Arp3p | Rvs167p | 0.833 | |
| 7 | Arp3p | Srv2p | 0.833 | |
| 10 | Spt5p | Rpb3p | 0.800 | Paf1p complex |
| 10 | Spt5p | Rpo21p | 0.800 | Paf1p complex |
| 12 | Pwp2p | Dip2p | 0.776 | Small subunit processome |
| 12 | Pwp2p | Ylr409C | 0.776 | |
| 12 | Sap190p | Sap155p | 0.776 | |
| 12 | Sap190p | Sap185p | 0.776 | |
| 12 | Pph21p | Pph22p | 0.776 | |
| 12 | Nop7p | Fpr4p | 0.776 | |
| 12 | Sap185p | Sap155p | 0.776 | |
| 12 | Sik1p | Cbf5p | 0.776 | |
| 12 | Nop2p | Ebp2p | 0.776 | Pre-60S ribosomal particle |
| 12 | Rpa135p | Ret1p | 0.776 | |
| 22 | Pwp2p | Asc1p | 0.750 | |
| 22 | Drs1p | Spb4p | 0.750 | |
| 24 | Rsm10p | Mrps5p | 0.744 | Mrp4p-associated complex (mitochondrial ribosome) |
| 24 | Mtr3p | Rrp45p | 0.744 | Exosome 3'-5' exoribonuclease complex |
| 24 | Rrp40p | Rrp46p | 0.744 | Exosome 3'-5' exoribonuclease complex |
| 24 | Rrp40p | Ski6p | 0.744 | Exosome 3'-5' exoribonuclease complex |
| 28 | Fun12p | Cbf5p | 0.743 | |
| 28 | Mrpl16p | Yml025Cp | 0.743 | |
| 28 | Mrpl1p | Mrpl9p | 0.743 | |
| 28 | Mrpl9p | Ypl183C-Ap | 0.743 | |
| 28 | Rrp40p | Rrp45p | 0.743 | Exosome 3'-5' exoribonuclease complex |
| 33 | Gin4p | Kcc4p | 0.727 | |
| 33 | Ecm16p | Prp43p | 0.727 | |
| 35 | Rps27Ap | Rpl42Bp | 0.714 | |
| 35 | Rps17Bp | Rpl36Ap | 0.714 | |
| 35 | Rps4Ap | Rpp2Ap | 0.714 | |
| 35 | Dur1,2p | Pdb1p | 0.714 | |
| 35 | Rsm7p | Mrps5p | 0.714 | Mrp4p-associated complex (mitochondrial ribosome) |
| 40 | Pat1p | Lsm2p | 0.692 | mRNA decay complex |
| 40 | Hrp1p | Nab2p | 0.692 | |
| 42 | Mrpl1p | Mrpl10p | 0.684 | |
| 42 | Mrpl9p | Yml025Cp | 0.684 | |
| 44 | Lsm2p | Dhh1p | 0.667 | 45S penta-snRNP |
| 44 | Pat1p | Dhh1p | 0.667 | 45S penta-snRNP |
| 46 | Dyn1p | Cdc55p | 0.667 | |
| 46 | Emp24p | Fks1p | 0.667 | |
| 46 | Yef3p | Act1p | 0.667 | |
| 46 | Yef3p | Pph22p | 0.667 | |
| 46 | Asc1p | Tfp1p | 0.667 |
Figure 4The rRNA processing complex with candidate members predicted by the decision tree. Red circles represent members of the complex annotated in MIPS. Green and yellow circles are proteins found to be co-complexed with the MIPS complex members by the decision tree with a score higher than 0.5. The yellow ones are verified in YPD while the green ones are not. The width of each edge is proportional to the decision tree score of the corresponding protein pair. Edges with scores lower than 0.1 as well as edges between the MIPS complex members are not shown.
Figure 5Correlation between scores from decision tree predictions and the fractions verified by YPD. For each of the four datasets (TAP spoke, TAP matrix, HMS-PCI spoke and HMS-PCI matrix), we plotted the fractions of its protein pairs at different score intervals that are also annotated in YPD.