| Literature DB >> 19095691 |
Shuye Pu1, Jessica Wong, Brian Turner, Emerson Cho, Shoshana J Wodak.
Abstract
Gold standard datasets on protein complexes are key to inferring and validating protein-protein interactions. Despite much progress in characterizing protein complexes in the yeast Saccharomyces cerevisiae, numerous researchers still use as reference the manually curated complexes catalogued by the Munich Information Center of Protein Sequences database. Although this catalogue has served the community extremely well, it no longer reflects the current state of knowledge. Here, we report two catalogues of yeast protein complexes as results of systematic curation efforts. The first one, denoted as CYC2008, is a comprehensive catalogue of 408 manually curated heteromeric protein complexes reliably backed by small-scale experiments reported in the current literature. This catalogue represents an up-to-date reference set for biologists interested in discovering protein interactions and protein complexes. The second catalogue, denoted as YHTP2008, comprises 400 high-throughput complexes annotated with current literature evidence. Among them, 262 correspond, at least partially, to CYC2008 complexes. Evidence for interacting subunits is collected for 68 complexes that have only partial or no overlap with CYC2008 complexes, whereas no literature evidence was found for 100 complexes. Some of these partially supported and as yet unsupported complexes may be interesting candidates for experimental follow up. Both catalogues are freely available at: http://wodaklab.org/cyc2008/.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19095691 PMCID: PMC2647312 DOI: 10.1093/nar/gkn1005
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Sources of information on yeast protein complexes
| Complexes | Source |
|---|---|
| MIPS | |
| SGD | |
| BioGrid | Reguly |
| YHTP2008 | Pu |
| Gavin | Gavin |
| Krogan | Krogan |
The MIPS complexes were taken at the leaf level of the hierarchical scheme, excluding homomeric complexes and complexes bearing the Systematic Analysis Code 550. The SGD complexes (file name at the SGD ftp site: go_protein_complex_slim.tab) represents mapping of gene products to the direct children of the ‘Macromolecular complex’ GO term (GOID:32991). BioGrid complexes can be found in the online supplementary materials of Reguly et al. (17): Supplementary Table 2: Co-purified complexes in the literature curated (LC) dataset.
Figure 1.Quantifying the correspondence of CYC2008 complexes with those in other sets of yeast protein complexes. Each complex in the CYC2008 is mapped into complexes in six other sets. These are MIPS, SGD, BioGrid, Krogan, Gavin and YHTP2008 (see Table 1 for details of these datasets). A Jaccard index is computed for each pair of matching complexes to quantify the extent of overlap between the complex components (see Materials and methods section for details). A Jaccard index of 1 indicates that two complexes are identical in terms of subunit compositions, and index of 0 means no overlap at all. This figure shows, for example, that only 165 CYC2008 complexes are nearly identical (0.9 < Jaccard index ⩽ 1.0) to SGD complexes, while 93 CYC2008 complexes have no overlap with any SGD complexes (Jaccard index = 0).
Subunits sharing between complexes within different complex catalogues
| Complex sets | CYC2008 | MIPS | SGD | BioGrid | YTHP400 | Krogan | Gavin |
|---|---|---|---|---|---|---|---|
| Number of complexes | 408 | 215 | 293 | 234 | 400 | 547 | 491 |
| Number of overlapping pairs | 430 | 168 | 1052 | 1206 | 194 | 0 | 19576 |
| Average overlap per complex | 1.054 | 0.781 | 3.542 | 5.154 | 0.485 | 0.000 | 39.870 |
| Average number of shared genes | 2.047 | 2.417 | 8.781 | 2.488 | 2.505 | 0.000 | 2.399 |
| Fraction of overlapping complexes | 0.436 | 0.414 | 0.670 | 0.744 | 0.195 | 0.000 | 0.969 |
The complex catalogues are the same as those considered in Figure 1 and Table 1 (see legend of Figure 1 and Table 1 for details). The overlap between complexes was computed as described in Materials and methods section. Number of overlapping pairs: number of complex pairs sharing subunits. Average overlap per complex: average number of other complexes with which a given complex shares subunits. Average number of shared genes: average number of subunits shared between pairs of overlapping complexes. Fraction of overlapping complexes: The fraction of complexes that share subunits with one or more other complexes.
Percentages of co-complex associations derived from different catalogues of yeast protein complexes, which overlap with PPIs archived in BioGrid database (17)
| Curated set | Number of genes in complexes | Number of co-complex pairs | Overlap with BioGrid |
|---|---|---|---|
| CYC2008 | 1630 | 11 327 | 53% |
| MIPS | 1194 | 11 014 | 38% |
| SGD | 1816 | 251 058 | 4.3% |
Co-complex associations are all pair-wise links between proteins belonging to the same complex. They are computed here for the three curated complex catalogues (MIPS, SGD and CYC2008) considered in this study, detailed in the Table 1. The inordinately large number of co-complex pairs derived from the SGD complexes stem from the fact that these complexes tend to represent functional groups and not physical complexes, as discussed in the text.