| Literature DB >> 17353931 |
Rob M Ewing1, Peter Chu, Fred Elisma, Hongyan Li, Paul Taylor, Shane Climie, Linda McBroom-Cerajewski, Mark D Robinson, Liam O'Connor, Michael Li, Rod Taylor, Moyez Dharsee, Yuen Ho, Adrian Heilbut, Lynda Moore, Shudong Zhang, Olga Ornatsky, Yury V Bukhman, Martin Ethier, Yinglun Sheng, Julian Vasilescu, Mohamed Abu-Farha, Jean-Philippe Lambert, Henry S Duewel, Ian I Stewart, Bonnie Kuehl, Kelly Hogue, Karen Colwill, Katharine Gladwish, Brenda Muskat, Robert Kinach, Sally-Lin Adams, Michael F Moran, Gregg B Morin, Thodoros Topaloglou, Daniel Figeys.
Abstract
Mapping protein-protein interactions is an invaluable tool for understanding protein function. Here, we report the first large-scale study of protein-protein interactions in human cells using a mass spectrometry-based approach. The study maps protein interactions for 338 bait proteins that were selected based on known or suspected disease and functional associations. Large-scale immunoprecipitation of Flag-tagged versions of these proteins followed by LC-ESI-MS/MS analysis resulted in the identification of 24,540 potential protein interactions. False positives and redundant hits were filtered out using empirical criteria and a calculated interaction confidence score, producing a data set of 6463 interactions between 2235 distinct proteins. This data set was further cross-validated using previously published and predicted human protein interactions. In-depth mining of the data set shows that it represents a valuable source of novel protein-protein interactions with relevance to human diseases. In addition, via our preliminary analysis, we report many novel protein interactions and pathway associations.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17353931 PMCID: PMC1847948 DOI: 10.1038/msb4100134
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Figure 1Data processing summary. Pie chart showing categorization of all immunoprecipitation experiments by type.
Figure 2IP-HTMS data analysis pipeline. (A–B) All bands from the lane(s) corresponding to a bait are extracted and MS/MS data acquired. (C) Data from each MS/MS acquisition are searched against a non-redundant human protein sequence database using the Mascot search engine. (D) Data from all bands corresponding to each bait are merged and protein and peptides clustered to generate a non-redundant list of protein identifications. (E) Spurious proteins and promiscuous binding proteins are removed. (F) A data table is produced for each bait protein with all of the scoring information, including scores and ranks by band and experiment. This data table contains all data required for the estimation of bait–prey interaction probability. (G) An interaction confidence score is calculated based upon a partial least squares model trained on the replicated subset of the data.
Summary of IP-HTMS interactome network filtering
| Filtering step | Baits | Unique proteins | Interactions |
|---|---|---|---|
| Unfiltered interaction network | 407 | 2826 | 24 540 |
| Remove bait–bait interactions | 407 | 2826 | 24 211 |
| Remove spill-over interactions | 407 | 2826 | 24 005 |
| Remove frequent binders and control experiment proteins | 338 | 2235 | 6463 |
Those instances where the bait protein was identified in the mass spectrometry experiment.
Observations of apparent spill-over from one gel lane to another; detected by manual examination of gels and peptide/protein identification data.
Frequent binders defined as prey proteins identified for ⩾5% of baits; control proteins are those ‘prey' proteins identified in ⩾2.5% of control experiments.
Comparison of IP-HTMS data set to other sources of human protein–protein interactions
| Protein–protein interaction data set | |||
|---|---|---|---|
| Known | Predicted | Experimental (Y2H) | |
| Interactions | 31 183 | 20 469 | 6727 |
| IP-HTMS baits featured in data set | 216 | 123 | 94 |
| Overlap with IP-HTMS space | 2332 | 668 | 366 |
| Intersection with IP-HTMS (number of interactions, percentage of total) | 149, 6.4% | 78, 11.4% | 29, 7.9% |
| Randomly permuted intersection with IP-HTMS (min, mean, max) | 7, 14.3, 25 | 3, 8.0, 14 | 0, 1.8, 7 |
| Statistical significance of intersection (fold-enrichment, | ∼10-fold, | ∼10 fold, | ∼15 fold, |
Ramani
Lehner and Fraser (2004)
Rual
IP-HTMS baits (from total of 343) featuring in the data set.
Number of interactions in the data set featuring one or more IP-HTMS baits.
Number of shared interactions between data set and IP-HTMS.
Number of shared interactions between randomly permuted (1000 iterations) IP-HTMS and data set.
Fold enrichment of observed intersection over intersection expected by chance.
Figure 3GO coincidence maps. Coincidence maps showing enrichment of bait–prey GO category combinations. Each bait–prey category combination is represented by a square in the matrix and colored according to the P-value from a pairwise statistical test (Fisher exact test) of association. (A) Bait–prey biological processes. (B) Randomly permuted bait–prey biological processes. (C) Cellular component categories. (D) Randomly permuted bait–prey cellular component categories.
Figure 4Comparison of interaction data sets to gene co-expression data. Red and green fractions of each bar correspond respectively to the proportions of positive and negative co-expression correlations for each data set. The numbers above each column represent the numbers of co-expression measurements overlapping the respective data set, and the numbers in parentheses represent the ratio of positive co-expression correlations to negative co-expression correlations. (1) The complete set of co-expression correlation measurements (Lee ). (2) The set of co-expression gene pairs mapping to one or more IP-HTMS baits. (3) The set of IP-HTMS bait–prey pairs for which a co-expression measurement is available. (4) The set of Y2H (Rual ) interactions for which a co-expression measurement is available. (5) The set of known (Ramani ) interactions for which a co-expression measurement is available.
Figure 5LYAR interactors also show strong gene co-expression with LYAR. Box plot showing distribution of P-values for all genes coexpressed (in three or more studies) with LYAR. Red points indicate co-expression P-values for 12 LYAR IP-HTMS interactors. Interactor descriptions include known subcellular localizations in square brackets where available.
Figure 6abGlobal and focused views of human interaction map. (A) Complete bait–bait connectivity map for 323 human bait proteins. Baits are represented as nodes in the graph. The size of the node represents the number of prey proteins identified for the bait. The thickness of edges between nodes represents the proportion of preys in common between the baits. Nodes are colored according to a combined disease and biological process classification, and selected classes indicated in the legend. (B) Focused views of selected bait–bait subnetworks (cross-referenced by roman numerals to panel A).
Figure 6c(C–F) Complete interaction networks (representing both baits and preys) for selected groups of baits. Nodes are colored according to cellular component or biological process as indicated on each figure. Baits are shown as large, labeled oval shapes, preys as small, labeled oval shapes. Arrow direction indicates a bait–prey relationship and line thickness indicates the interaction confidence score (see legend in panel C). Preys are grouped according to the baits with which they were identified (except panel E where they are grouped according to interaction confidence score). (C) Proteasome baits (corresponds to bait–bait cluster B (panel iv)). (D) Sumoylation pathway (corresponds to bait-bait cluster B (panel vi)). (E) Nek6. (F) Translation initiation and elongation (corresponds to bait–bait cluster B (panel iii)).
Figure 6f(C–F) Complete interaction networks (representing both baits and preys) for selected groups of baits. Nodes are colored according to cellular component or biological process as indicated on each figure. Baits are shown as large, labeled oval shapes, preys as small, labeled oval shapes. Arrow direction indicates a bait–prey relationship and line thickness indicates the interaction confidence score (see legend in panel C). Preys are grouped according to the baits with which they were identified (except panel E where they are grouped according to interaction confidence score). (C) Proteasome baits (corresponds to bait–bait cluster B (panel iv)). (D) Sumoylation pathway (corresponds to bait-bait cluster B (panel vi)). (E) Nek6. (F) Translation initiation and elongation (corresponds to bait–bait cluster B (panel iii)).
Figure 6de(C–F) Complete interaction networks (representing both baits and preys) for selected groups of baits. Nodes are colored according to cellular component or biological process as indicated on each figure. Baits are shown as large, labeled oval shapes, preys as small, labeled oval shapes. Arrow direction indicates a bait–prey relationship and line thickness indicates the interaction confidence score (see legend in panel C). Preys are grouped according to the baits with which they were identified (except panel E where they are grouped according to interaction confidence score). (C) Proteasome baits (corresponds to bait–bait cluster B (panel iv)). (D) Sumoylation pathway (corresponds to bait-bait cluster B (panel vi)). (E) Nek6. (F) Translation initiation and elongation (corresponds to bait–bait cluster B (panel iii)).