| Literature DB >> 17605818 |
G Traver Hart1, Insuk Lee, Edward R Marcotte.
Abstract
BACKGROUND: Identifying all protein complexes in an organism is a major goal of systems biology. In the past 18 months, the results of two genome-scale tandem affinity purification-mass spectrometry (TAP-MS) assays in yeast have been published, along with corresponding complex maps. For most complexes, the published data sets were surprisingly uncorrelated. It is therefore useful to consider the raw data from each study and generate an accurate complex map from a high-confidence data set that integrates the results of these and earlier assays.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17605818 PMCID: PMC1940025 DOI: 10.1186/1471-2105-8-236
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Applying the matrix-model scoring algorithm. The four subunits of the DNA primase core complex are detected using the scoring algorithm. (A) In the Gavin et al. TAP-MS data set, Pol1 and Pol12 were purified as bait and their corresponding bait-prey, spoke model interactions are shown in blue (plus number of additional prey identified shown in parentheses). In the Krogan et al. assay (shown in orange), the same baits plus Pri1 were purified. (B) In the matrix model, both bait-prey and prey-prey interactions are considered. Within a given dataset, the total number of links observed between each pair of proteins is recorded and the P-value calculated as described in the text. The PICO network was generated by multiplying P-values for the same interaction derived from different data sets, e.g. Pol1–Pol12 is discovered in both Gavin and Krogan and scored accordingly. (C) The PICO network integrates probability scores from all data sources, here represented as -ln(P-value). Values in black are final PICO scores; separate scores from Gavin et al. (blue) and Krogan et al. (orange) are shown where applicable. No data from Ho et al. was relevant to this example.
Figure 2Performance curves of the probabilistic scoring method. We measured the performance of the various datasets against a reference set consisting of a matrix-model interaction set generated from MIPS curated complexes, excluding the large and small ribosomal subunits (which would otherwise account for over half of the interactions in this set). Single points represent an entire dataset. Curves represent a dataset that has been scored using the hypergeometric scoring algorithm, rank ordered, and plotted with each symbol representing the cumulative addition of the 500 next highest scoring interactions (i.e. tail of the curve represents the entire dataset). The scoring scheme outperforms the raw data as well as the filtered, published sets in all cases; the integrated PICO net outperforms the individual scored data sets, and the derived complexes are slightly more accurate than PICO (for all thresholds; data not shown).
Figure 3Effect of thresholds on network size and derived complex accuracy. (A) Interactions in the PICO network were rank ordered, and the E-value was calculated as the sum of P-values. The number of interactions at each E-value threshold was counted; the total decreases as an increasingly stringent threshold is applied. (B) At each E-value threshold, the subset of interactions was clustered with MCL with parameters that optimized correlation with the filtered set of GO component annotations [see Methods]. The correlation with GO component (filled circles) and MIPS complexes (hollow circles) generally improves with the stringency of the E-value cutoff. We judged that the 10-2 cutoff provides a reasonable tradeoff between increasing accuracy and decreasing coverage, and chose this subset for further study.
Figure 4A subset of the E-2 complex map. After applying the E = 10-2 threshold to the PICO interaction set, the subset of 5,352 interactions was clustered with MCL, using parameters that maximized correlation with a filtered set of GO component annotations. Interactions within clusters (4,411) were plotted with Cytoscape using the included "organic" layout algorithm. Interactions between clusters (941) were omitted for clarity. Yellow nodes indicate essential proteins; red, nonessential. For the full image please see Additional File 4.
Figure 5Inter-complex interactions. Interactions in the E-2 complex map represent 4,411 of the 5,352 interactions in the PICO network at the E = 10-2 threshold. The 941 remaining protein-protein interactions (PPI) collapse to 248 complex-complex interactions. Here we map 128 inter-complex interactions, each comprising two or more protein-protein interactions (821 PPI total); singletons are omitted for clarity. Nodes represent E-2 complexes: yellow indicates >70% essential subunits; labels indicate highest-scoring GO component, where applicable. Edge thickness reflects number of interactions between complex subunits, ranging from two (thinnest) to 24 or more (thickest) PPI; number of interactions is shown on each edge. Density of PPI between complexes of similar function (e.g. 190 PPI from U4/U6/U5 tri-snRNP complex to neighbors; 86 PPI between C20/C30/C44/C78 ribosome biogenesis modules; 64 PPI linking C17 histone-associated complex to neighbors; shaded in blue) illustrates hierarchical nature of yeast complex network.
Figure 6Essential proteins are concentrated in a subset of complexes. The distribution of essential proteins in complexes was compared to a randomized background. The fraction of essential proteins in each complex was calculated, sorted into equal-sized bins, and compared to an expected background generated by randomly assigning essential proteins to the same set of complexes. The log ratio of observed to expected frequency for each bin is plotted here: positive values indicate observed frequency above random; negatives indicate below random. The distribution illustrates the concentration of essential proteins in some complexes, and a corresponding absence of essentials in others. Bars marked with an asterisk represent statistically significant deviations from random expectation (P <10-3).
Essential Complexes. Selected essential complexes from the E-2 complex set. Complexes listed are composed of at least 4 subunits, of which >70% are essential. For each complex, the table lists the E-2 complex identifier, the size of the complex, the fraction of essential proteins, the most significant GO cellular component annotation for the complex, and the list of proteins in the complex. Twenty-six percent of all essential genes in yeast are represented in these complexes
| C1 | 35 | 74% | DNA-directed RNA polymerase III complex | DST1, IWR1, RET1, RPA12, RPA135, RPA14, RPA190, RPA34, RPA43, RPA49, RPB10, RPB11, RPB2, RPB3, RPB4, RPB5, RPB7, RPB8, RPB9, RPC11, RPC17, RPC19, RPC25, RPC31, RPC34, RPC37, RPC40, RPC53, RPC82, RPO21, RPO26, RPO31, SPT4, TFG1, TFG2 |
| C4 | 27 | 93% | small nucleolar ribonucleoprotein complex | BMS1, DIP2, ECM16, EMG1, IMP3, MPP10, NAN1, NOC4, NOP14, POL5, PWP2, SOF1, UTP10, UTP13, UTP14, UTP15, UTP18, UTP20, UTP21, UTP30, UTP4, UTP5, UTP6, UTP7, UTP8, UTP9, YGR210C |
| C11 | 20 | 75% | mRNA cleavage and polyadenylation specificity factor complex | BUD14, CFT1, CFT2, FIP1, GIP3, GLC7, GLC8, MPE1, PAP1, PFS2, PTA1, PTI1, REF2, SDS22, SSU72, SWD2, SYC1, YPI1, YSH1, YTH1 |
| C12 | 20 | 85% | U4/U6 × U5 tri-snRNP complex | AAR2, BRR2, DIB1, LEA1, LSM8, PRP11, PRP21, PRP3, PRP31, PRP38, PRP4, PRP6, PRP8, PRP9, RSE1, SMX2, SNU114, SNU23, SNU66, SPP381 |
| C13 | 18 | 72% | proteasome core complex, alpha-subunit complex (sensu Eukaryota) | FLC2, GRH1, OSM1, PRE1, PRE10, PRE2, PRE3, PRE4, PRE5, PRE6, PRE7, PRE8, PRE9, PUP1, PUP2, PUP3, RED1, SCL1 |
| C14 | 18 | 72% | snRNP U1 | BRR1, LUC7, MUD1, NAM8, PRP39, PRP40, PRP42, SMB1, SMD1, SMD2, SMD3, SME1, SMX3, SNP1, SNU56, SNU71, STO1, YHC1 |
| C20 | 13 | 77% | (no significant annotation) | BRX1, CIC1, DRS1, ERB1, FPR4, HAS1, MAK5, NOC2, NOC3, PUF6, PWP1, RRP5, YTM1 |
| C26 | 11 | 73% | eukaryotic translation initiation factor 2B complex | CDC123, GCD1, GCD11, GCD2, GCD6, GCD7, GCN3, PET111, SUI2, SUI3, YVH1 |
| C30 | 10 | 90% | (no significant annotation) | EBP2, MRT4, NOG1, NOP15, NOP2, NOP7, NUG1, RLP7, RPF2, TIF6 |
| C38 | 8 | 88% | nuclear pore | GLE2, NIC96, NSP1, NUP116, NUP159, NUP49, NUP57, NUP82 |
| C41 | 8 | 88% | DASH complex | ASK1, DAD1, DAD2, DAD3, DAM1, DUO1, SPC19, SPC34 |
| C42 | 8 | 100% | exocyst | EXO70, EXO84, SEC10, SEC15, SEC3, SEC5, SEC6, SEC8 |
| C44 | 8 | 100% | (no significant annotation) | DBP10, NIP7, NSA1, RIX7, RPF1, RRP1, SPB1, SPB4 |
| C46 | 7 | 86% | Arp2/3 protein complex | ARC15, ARC18, ARC19, ARC35, ARC40, ARP2, ARP3 |
| C48 | 7 | 71% | DNA replication factor C complex | CTF18, ELG1, RFC1, RFC2, RFC3, RFC4, RFC5 |
| C53 | 7 | 100% | transcription factor TFIIH complex | CCL1, KIN28, RAD3, SSL1, TFB1, TFB3, TFB4 |
| C54 | 7 | 86% | signal recognition particle (sensu Eukaryota) | LHP1, SEC65, SRP14, SRP21, SRP54, SRP68, SRP72 |
| C55 | 7 | 100% | nucleolar ribonuclease P complex | POP1, POP3, POP4, POP5, POP7, POP8, RPP1 |
| C65 | 6 | 100% | nuclear origin of replication recognition complex | ORC1, ORC2, ORC3, ORC4, ORC5, ORC6 |
| C67 | 6 | 100% | transcription factor TFIIIC complex | TFC1, TFC3, TFC4, TFC6, TFC7, TFC8 |
| C72 | 6 | 83% | (no significant annotation) | DSL1, SEC22, SEC39, TIP20, UFE1, USE1 |
| C74 | 6 | 100% | chaperonin-containing T-complex | CCT2, CCT3, CCT4, CCT5, CCT6, TCP1 |
| C78 | 5 | 100% | (no significant annotation) | IPI1, IPI3, RIX1, RSA4, SDA1 |
| C79 | 5 | 100% | nuclear cohesin complex | CDC5, IRR1, MCD1, SMC1, SMC3 |
| C85 | 5 | 80% | GINS complex | CTF4, PSF1, PSF2, PSF3, SLD5 |
| C86 | 5 | 100% | nuclear condensin complex | BRN1, SMC2, SMC4, YCG1, YCS4 |
| C89 | 5 | 80% | nucleolar preribosome, small subunit precursor | ENP1, HRR25, LTV1, RIO2, TSR1 |
| C101 | 4 | 100% | MIND complex | DSN1, MTW1, NNF1, NSL1 |
| C106 | 4 | 100% | alpha DNA polymerase:primase complex | POL1, POL12, PRI1, PRI2 |
| C110 | 4 | 75% | (no significant annotation) | CIA1, MET18, NAR1, YHR122W |
| C111 | 4 | 75% | (no significant annotation) | NAB3, NAB6, NRD1, SEN1 |
| C115 | 4 | 100% | mRNA cleavage factor complex | CLP1, PCF11, RNA14, RNA15 |
| C124 | 4 | 75% | transcription factor TFIIE complex | DBP2, PPN1, TFA1, TFA2 |
| C92 | 4 | 75% | outer plaque of spindle pole body | SPC72, SPC97, SPC98, TUB4 |
| C93 | 4 | 100% | Ndc80 complex | NUF2, SPC24, SPC25, TID3 |
Figure 7