Literature DB >> 26344197

Panorama of ancient metazoan macromolecular complexes.

Cuihong Wan^1,2, Blake Borgeson², Sadhna Phanse¹, Fan Tu², Kevin Drew², Greg Clark³, Xuejian Xiong^4,5, Olga Kagan¹, Julian Kwan^1,4, Alexandr Bezginov³, Kyle Chessman^4,5, Swati Pal⁵, Graham Cromar^4,5, Ophelia Papoulas², Zuyao Ni¹, Daniel R Boutz², Snejana Stoilova¹, Pierre C Havugimana¹, Xinghua Guo¹, Ramy H Malty⁶, Mihail Sarov⁷, Jack Greenblatt^1,4, Mohan Babu⁶, W Brent Derry^4,5, Elisabeth R Tillier³, John B Wallingford^2,8, John Parkinson^4,5, Edward M Marcotte^2,8, Andrew Emili^1,4.

Abstract

Macromolecular complexes are essential to conserved biological processes, but their prevalence across animals is unclear. By combining extensive biochemical fractionation with quantitative mass spectrometry, here we directly examined the composition of soluble multiprotein complexes among diverse metazoan models. Using an integrative approach, we generated a draft conservation map consisting of more than one million putative high-confidence co-complex interactions for species with fully sequenced genomes that encompasses functional modules present broadly across all extant animals. Clustering reveals a spectrum of conservation, ranging from ancient eukaryotic assemblies that have probably served cellular housekeeping roles for at least one billion years, ancestral complexes that have accrued contemporary components, and rarer metazoan innovations linked to multicellularity. We validated these projections by independent co-fractionation experiments in evolutionarily distant species, affinity purification and functional analyses. The comprehensiveness, centrality and modularity of these reconstructed interactomes reflect their fundamental mechanistic importance and adaptive value to animal cell systems.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Multiprotein Complexes

Year: 2015 PMID： 26344197 PMCID： PMC5036527 DOI： 10.1038/nature14877

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Elucidating the components, conservation and functions of multiprotein complexes is essential to understand cellular processes[1,2], but mapping physical association networks on a proteome-wide scale is challenging. The development of high-throughput methods for systematically determining protein-protein interactions (PPI) has led to global molecular interaction maps for model organisms including E. coli, yeast, worm, fly and human[3-10]. In turn, comparative analyses have shown that PPI networks tend to be conserved[11,12], evolve more slowly than regulatory networks[13], and closely mirror function retention across orthologous groups[11,14,15]. Yet fundamental questions arise[16,17]: To what extent are physical interactions preserved between phyla? Which protein complexes are evolutionarily stable across animals? What is unique about their composition, phylogenetic distribution and phenotypic significance? Since previous cross-species interactome comparisons, based on experimental data from different sources and methods, show limited overlap[12,18], we sought to produce a more comprehensive and accurate map of protein complexes common to metazoa by applying a standardized approach to multiple species. We employed biochemical fractionation of native macromolecular assemblies followed by tandem mass spectrometry to elucidate protein complex membership (Fig. 1; see Extended Methods). Previous application of this co-fractionation strategy to human cell lines preferentially identified Vertebrate specific protein complexes[6], so we selected eight additional species for study based on their relevance as model organisms, spanning roughly a billion years of evolutionary divergence (Fig. 1a). The resulting co-fractionation data (Fig. 1b) acquired for Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Mus musculus (mouse), Strongylocentrotus purpuratus (sea urchin), and human was used to discover conserved interactions (Fig. 1c), while the data obtained for Xenopus laevis (frog), Nematostella vectensis (sea anemone), Dictyostelium discoideum (amoeba), and Saccharomyces cerevisiae (yeast) was used for independent validation. Details on the cell types, developmental stages, and fractionation procedures used are provided in Supplementary Table 1.

Figure 1

Workflow

, Phylogenetic relationships of organisms analyzed in this study. We fractionated soluble protein complexes from worm (C. elegans) larvae, fly (D. melanogaster) S2 cells, mouse (M. musculus) embryonic stem cells, sea urchin (S. purpuratus) eggs, and human (HEK293/HeLa) cell lines. Holdout species (‘T’, for test) likewise analyzed were frog (X. laevis), an amphibian; sea anemone (N. vectensis), a Cnidarian with primitive Eumetazoan tissue organization; slime mold (D. discoideum), an amoeba; and yeast (S. cerevisiae), a unicellular eukaryote. , Protein fractions were digested and analysed by high performance liquid chromatography-tandem mass spectrometry (LC-MS/MS), measuring peptide spectral counts and precursor ion intensities. . Integrative computational analysis: after ortholog mapping to human, correlation scores of co-eluting protein pairs detected in each ‘input’ species were subjected to machine learning together with additional external association evidence, using the CORUM complex database as a reference standard for training. High-confidence interactions were clustered to define co-complex membership.

We identified and quantified (see Extended Methods) 13,386 protein orthologs across 6,387 fractions obtained from 69 different experiments (Fig. 2a), an order of magnitude expansion in data coverage relative to our original (H. sapiens only) study[6]. Individual pair-wise protein associations were scored based on the fractionation profile similarity measured in each species. Next, we used an integrative computational scoring procedure (Fig. 1c; see Extended Methods) to derive conserved interactions for human proteins and their orthologs in worm, fly, mouse and sea urchin, defined as high pair-wise protein co-fractionation in at least two of the five input species. The support vector machine learning classifier used was trained (using 5-fold cross validation) on correlation scores obtained for conserved reference annotated protein complexes (see Extended Methods), and combined all of the input species co-fractionation data together with previously published human[6,19] and fly interactions[5] and additional supporting functional association evidence[20] (HumanNet). Notably, measurements of overall performance showed high precision with reasonable recall by the co-fractionation data alone (Fig. 2b), with external datasets serving only to increase precision and recall as we required all derived interactions to have significant biochemical support (see Extended Methods). Co-fractionation data of each input species impacted overall performance, in each case increasing precision and recall (Extended Data Fig. 1a).

Figure 2

Derivation and projection of protein co-complex associations across taxa

, Expanded coverage via experimental scale-up relative to our previous human study[6]. Chart shows number of proteins detected, most (63%) in two or more species. , Performance benchmarks, measuring precision and recall of our method and data in identifying known co-complex interactions (annotated human complexes from CORUM[39]). Complexes were split into training and withheld test sets; 5-fold cross-validation against 4,528 interactions derived from the withheld test set shows strong performance gains, beyond baselines achieved using only co-fractionation or external evidence alone. , Plots showing high enrichment (probability ratio of interacting) of predicted interacting orthologous protein pairs (relative to non-interacting pairs) among highly correlated fractionation profiles, in both the holdout validation (test, ‘T’) and input species (colors reflect clade memberships). , (left) Representative co-fractionation data (normalized spectral counts shown for portions of 3 of 42 experimental profiles) from human, fly, and sea urchin showing characteristic profiles of proteasome core, base and lid subcomplexes. Hierarchical clustering (right) of pan-species pairwise Pearson correlation scores (centre) is consistent with accepted structural models (PDB id: 4CR2; core, red; base, blue; lid, green; out-clusters, white). , Projection of conserved co-complex interactions across 122 eukaryotic species, indicating overlap with leading public PPI reference databases[39–41]. STRING bars indicate excess over CORUM; GeneMania bars indicate excess over both; component and interaction occurences across Clades indicated at bottom.

Extended Data Figure 1

Performance measures

a, Performance benchmarks, measuring the precision and recall of our method and data in identifying known co-complex interactions from a withheld reference set of annotated human complexes (from CORUM[39]; as in Fig. 2b). 5-fold cross-validation against this withheld set shows strong performance gains, beyond a baseline achieved using only human and mouse co-fractionation data along with additional evidence from independent protein interaction screens[5,19] and a functional gene network[20] (far-left curve), made by integrating co-fractionation data from the additional non-human animal species (as indicated). “All data” and “Fractionation data only” curves include biochemical fractionation data from all 5 input species: human, mouse, urchin, fly and worm; the latter curve omits all external data. In all cases, at least 2 species were required to show supporting biochemical evidence. Recall is shown fraction of 4,528 total positive interactions derived from the withheld human CORUM complexes. b, All 16,655 interactions were identified at least in two species, half (49%, 8,121) found in three or more species. c, Among these high-confidence co-complex interactions, 8,981 (54%) were not reported in iRef[44] (v13.0), Biogrid[45] (v3.2.119) or CORUM reference (Supplementary Table 2) for any of the five input species or in yeast; half (46%, 4,128) of these novel co-complex interactions have co-fractionation evidences in 3 or more species. d, Final precision/recall performance on withheld interaction test set. An SVM classifier was trained using interactions derived from our training set of CORUM complexes, then ~1M protein pairs co-eluting in at least 2 of the 5 input species were scored by the classifier. Black curve shows precision and recall for ranked list of co-eluting pairs, with recall representing fraction recovered of 4,528 total positive interactions derived from the withheld set of merged human CORUM complexes, and precision measured using co-eluting pairs where both members of the pair are contained in the set of proteins represented in the CORUM withheld set. The top 16,655 pairs, giving a cumulative precision of 67.5% and recall of 23.0% on this withheld test set, form the high-confidence set of co-complex protein-protein interactions (blue circle). The highest-scoring interactions were clustered using the two-stage approach described in the Extended Methods, yielding a final set of 7,669 interactions which form the 981 identified complexes (red circle; precision=90.0%, recall=20.8%).

The final filtered interaction network consists of 16,655 high-confidence co-complex interactions in human (Supplementary Table 2). All of the interactions were supported by direct biochemical evidence in at least two input species, with half (8,121) detected in 3 or more (Extended Data Fig. 1b), enabling cross-species modeling and functional inference. Multiple lines of evidence support the quality of the network: Reference complexes withheld during training were reconstructed with higher precision and recall (Fig. 2b; see Extended Data Fig. 1c) relative to our human-only map[6]. The interacting proteins were also 6-fold enriched (hypergeometric p-value < 10−24) for shared subcellular localization annotations in the Human Protein Atlas Database[21], 21-fold enriched (p-value < 10−56) for shared disease associations in OMIM[22], and showed highly correlated human tissue proteome abundance profiles[23] (Extended Data Fig. 2a).

Extended Data Figure 2

Properties of protein elution profiles

a, Distribution of global protein tissue expression pattern similarity, measured as the Pearson correlation coefficient of protein abundance across 30 human tissues[23], showing markedly higher correlations for 16,468 protein-protein pairs of putative co-complex interaction partners compared to the same number of randomized pairs of proteins in the network which were not predicted to interact. b, Heatmap illustrating the low to moderate cross-species Spearman’s rank correlation coefficients in the elution profiles observed between orthologous proteins during mixed-bed ion exchange chromatography (IEX-HPLC) under standardized conditions, highlighting the shift in absolute chromatographic retention times in different species. This variation indicates that the conservation of co-fractionation by putatively interacting proteins is not merely a trivial result stemming from fixed column retention times. c, The degree of co-fractionation is measured as the correlation coefficient between elution profiles. Spatial proximity is calculated from the mean of residue pair distances between components of multisubunit complexes with known 3D structures (see Extended Methods).

To independently verify the reliability of these projections, we examined the co-fractionation profiles of putatively interacting orthologs (i.e., interologs) in the four holdout species, as obtained by protein quantification across 1,127 biochemical fractions (see Extended Methods). Strikingly, whereas sequence divergence changed absolute chromatographic retention times (Extended Data Fig. 2b), most of the predicted interactors showed highly correlated co-fractionation profiles among the holdout test species to a degree comparable to the input species used for learning (Fig. 2c). The biochemical data obtained for frog and sea anemone showed slightly better agreement than for Dictyostelium and yeast in proportion to evolutionary distance[24]. Besides indicating stably-associated proteins, our multi-species biochemical profiles faithfully recapitulated the architecture of multiprotein complexes of known 3D structure, with a general trend for most correlated protein pairs to be spatially closer (Extended Data Fig. 2c). For example, hierarchical clustering of 30S proteasome subunits according to chromatographic elution profiles of all five input species correctly separated the 20S and 19S particles and the regulatory lid from the base complex (Fig. 2d), reflecting known hierarchies of complex formation and disassembly. Since most of the interacting components were phylogenetically conserved across vast evolutionary timescales, we were able to predict over 1 million high-confidence co-complex interactions among orthologous protein pairs for 122 extant Eukaryotes with sequenced genomes (Supplementary Table 3). The number of interactions ranged from 8,000 to 15,000 interactions per species depending on phyla (Fig. 2e), with more projected among Deuterostomes, Protostomes and Cnidaria, which show high component retention, and fewer in Fungi, Plants, and, especially, Protists, where the relative paucity of co-complex conservation likely reflects inherent clade diversity, especially in parasite genomes (e.g., gene loss among Apicomplexa). While largely congruent with previous smaller-scale studies of PPI conservation[25], the majority of conserved co-complex interactions are novel (i.e., <1/3 curated in CORUM, STRING and GeneMania databases; Fig. 2e). This markedly increases the number of metazoan protein interactions reported to date (Supplementary Table 3), covering roughly 10–25% of the estimated conserved animal cell interactome[26,27], opening up many new avenues of inquiry. To systematically define evolutionarily conserved functional modules, we partitioned the interaction network using a two-stage clustering procedure (Fig. 1c; see Extended Methods) that allowed proteins to participate in multiple complexes (i.e., moonlighting) as merited (Extended Data Fig. 3a). The 981 putative multiprotein groupings (Fig. 3a; see Supplementary Table 4) includes both many well-known and novel complexes linked to diverse biological processes (Extended Data Fig. 3b). The complexes have estimated component ages spanning from ~500 million (i.e., metazoan-specific, or new) to over 1 billion years (i.e., ancient, or old) of evolutionary divergence. Details of species, orthologs, taxonomic groups, protein ages and evolutionary distances are provided in Supplementary Tables 3 and 5 and Supplementary Material.

Extended Data Figure 3

Derivation of complexes

a, The 2,153 proteins present in the 981 derived metazoan complexes participate in multiple assemblies (‘moonlighting’) to an extent comparable to the sharing of subunits reported for literature-derived complexes (CORUM). For comparison, we examined the 1,550 unique proteins from the full CORUM set of 1,216 human complexes passing our selection criteria for supporting evidence (‘Unmerged’) and the 1,461 unique proteins from the non-redundant set of 501 merged complexes used as the reference for splitting our training and testing sets, with some of the largest complexes removed to avoid bias in training (‘Merged’; see ‘Optimizing the two-stage clustering’ in Extended Methods for details). b, Schematic of 981 identified complexes containing 2,153 unique proteins. In this graphical representation, 7,669 co-complex interactions are shown as lines, and proteins as nodes. Red and green interactions were previously annotated in CORUM. Red interactions were used in training the classifier and/or clustering procedure, while green interactions were held out for validation purposes. Gray interactions were not previously annotated in CORUM.

Figure 3

Prevalence of conservation of protein complexes across metazoa and beyond

, Conserved multiprotein complexes, identified by clustering, arranged according to average estimated component age (see Extended Methods and ref. [25]). Proteins (nodes) classified as metazoan (green) or ancient (orange); assemblies showing divergent phylogenetic trajectories termed ‘mixed’. , Example complexes with different proportions of old and new subunits. , Presumed origins of metazoan (new), mixed, and old complexes; ‘?’ indicates variable origins of new genes. , Heatmap showing prevalence of selected complexes across phyla. Color reflects fraction of components with detectable orthologs (absence, dark blue). Sea anemone (N. vectensis) most distant metazoan (Cnidarian) analyzed biochemically.

Strikingly, although proteins arising in metazoa (i.e. by gene duplication or other means) account for ~3/4 of all human gene products, they form only ~1/3 (39%; 147) of the clusters (Fig. 3a). These ‘new’ complexes tend to be smaller (i.e., ≤3 components; Fig. 3b) and specific (i.e., components not present in ‘mixed’ complexes). This indicates that although protein number and diversity greatly increased with the rise of animals[25], most stable protein complexes were inherited from the unicellular ancestor and subsequently modified slightly over time (Fig. 3c and Supplementary Table 5). Indeed, the dominant phylogenetic profile of complexes across Eukarya (Fig. 3d) is composed either entirely (344 ‘old’ complexes) or predominantly (490 ‘mixed’ complexes) of ancient subunits ubiquitous among eukaryotes (Extended Data Fig. 4a; see Supplementary Table 5 for details), the latter presumably reflecting preferential accretion of new components to pre-existing macromolecules (Fig. 3c)[28].

Extended Data Figure 4

Properties of new and old proteins and complexes

a, The 2,153 protein components in the conserved animal complexes tend to be more ancient than the 2,301 proteins reported in the CORUM reference complexes or in two recent large-scale protein interaction assays, based on either the 7,062 proteins found by affinity purification/mass spectrometry (AP/MS; BioGrid 166968, Huttlin EL (2014/pre-pub), downloaded Feb 10th 2015) or the 3,667 proteins analyzed by yeast two-hybrid assays (Y2H)[10]. Ages are derived from OMA as in ref. [25]. b, Annotation rates (mean count of annotation terms per protein) of old and new proteins in the derived complexes and pairwise PPI, compared with proteins in the CORUM reference complex set. Old proteins (defined by OMA) from the complexes generally exhibited higher annotation rates than new proteins. c, Differential enrichment of old, mixed and metazoan-specific protein complexes for functional annotations (select GO-slim biological process terms shown, top) and protein domains (Pfam, bottom).

These primordial complexes are present throughout the Opisthokonta supergroup (animals and fungi), estimated to be >1 billion years old[29], and Plants (and presumably lost/significantly diverged among parasitic Protists). Reflecting this central importance, these complexes tend strongly to be ubiquitously expressed throughout all cell types and tissues (Extended Data Fig. 5a), are abundant (Extended Data Fig. 5b), and are enriched for associations to human disease and perturbation phenotypes in C. elegans (Supplementary Table 6). In comparison with other proteins in the 16,655 interactions, the older, conserved proteins present in these stable complexes have lower average domain complexity (p < 0.02; see Extended Methods), suggesting multi-domain architectures underlie more transient or tissue-specific interactions. Notably, whereas ‘mixed’ and ‘old’ complexes are enriched for functional associations with core cellular processes, such as metabolism (Extended Data Fig. 4c), the strictly metazoan complexes were far more likely to be linked to cell adhesion, organization and differentiation, consistent with roles in multicellularity. Reflecting these different evolutionary trajectories, ‘new’ clusters are substantially more enriched for cancer-related proteins (42%; 62/147; hypergeometric p ≤ 10−5) compared to strictly ‘old’ (15%; 53/344; p ≤ 10−3) clusters (Z-test < 0.0001) (Supplementary Table 7), have generally lower annotation rates (Extended Data Fig. 4b), and show different preponderances of protein domains (Extended Data Fig. 4c and Supplementary Table 6).

Extended Data Figure 5

Abundance and expression trends for proteins in complexes

Proteins within the identified complexes tend to be ubiquitously expressed across human tissues. Pie charts show the proportions of proteins with varying tissue expression patterns, from a recently published human tissue proteome map[46], comparing: a, the full set of 20,258 human proteins, with b, the 2,131 proteins within the identified complexes. Consistent with these observations, 91% of the protein components in the complexes were expressed in >15 tissues in data from a reference human proteome[23], compared to less than half (46%) of the 17,294 proteins in the overall reference set (Z-test p < 0.001). The distributions of average mRNA and protein abundances for all proteins identified and those within complexes are shown in panel c, mRNA abundances (data from EBI accession E-MTAB-1733) and d, protein abundances (data from PaxDb integrated dataset, 9606-H.sapiens_whole_organism-integrated_dataset). Evolutionarily ‘old’ proteins (defined by OMA as described in ref. [25] and mentioned earlier) tend towards higher abundances, even for proteins in reference complexes.

We used multiple approaches to assess the accuracy (Fig. 4) and functional significance (Fig. 5) of the predicted complexes. First, we performed affinity purification-mass spectrometry (AP/MS) experiments on select novel complexes from the ‘new’, ‘old’ and ‘mixed’ age clusters, validating most associations in both worm and human (Fig. 4a, Extended Data Fig. 6a). We next performed a global validation by comparing our derived complexes to a newly reported large-scale AP/MS study of 23,756 putative human protein interactions detected in cell culture (BioGrid pre-publication 166968, Huttlin EL et al., downloaded Feb. 10, 2015), and observed a partial, but exceptionally significant, overlap to a degree comparable to literature-derived complexes (Fig. 4b, Extended Data Fig. 6b).

Figure 4

Physical validation of complexes

, Verification of complexes from tagged human cell lines and transgenic worms (see Extended Methods). Inset reports spectral counts obtained in replicate AP/MS analyses of indicated bait protein (header). MIB2-VPS4 complex confirmed by co-IP (Extended Data Fig. 6a). , Conserved complexes significantly overlap large-scale AP/MS data reported for human cell lines (BioGrid pre-pub 166968, Huttlin et al., 2015) to a comparable extent as literature reference sets[39,42], using 3 measures of complex-level agreement (see Extended Methods, Extended Data Fig. 6b); ***, p-value < 0.001, determined by shuffling (gray distributions). , Agreement of inferred molecular weights (MW) of human protein complexes with size exclusion chromatography (SEC) profiles (data in from ref. [43]). , Co-elution of human Commander complex subunits by SEC consistent with an approx. 500 kDa particle.

Figure 5

Functional validation of complexes

, Morpholino knockdown of COMMD2 (n = 55 animals, 2 clutches, 1 eye each) or COMMD3 (n = 64) in X. laevis embryos causes defective head and eye development (control n = 57; Extended Data Fig. 9f, h). ***, p < 0.0001, 2-sided Mann-Whitney test. , COMMD2/3 knockdown animals (5 embryos per treatment examined) show altered neural patterning, including posterior shift or loss of expression of mid-brain marker EN2 and KROX20(EGR1), the latter in rhombomeres R3/R5 (compare to Extended Data Fig. 9g, h). , Enhanced embryonic lethality (i.e., epistasis) following RNAi knockdown in C. elegans of B0035.1 (ZNF207) and bub-3 together (eggs laid: HT115, 1308; B0035.1, 1096; bub-3, 445; bub-3 + B0035.1, 341). Enhanced sensitivity (mean +/− s.d. across four cell culture experiments) of two independent CCDC97-knockout lines to the SF3b inhibitor pladienolide B (PB) relative to control HEK293 cells. , Enrichment (permutation test p-value) for interactions among sequential pathway components and metabolic enzymes relative to shuffled controls (n refers to enzyme index, where n,n+1 denotes sequential enzymes, n,n+2 sequential-but-one, etc, as described in SI (“Analysis of consecutively acting signal transduction and metabolic enzyme interactions”). , Metabolic channeling as opposed to traditional (typical) two-step cascade model. , Conserved interactions among consecutively acting enzymes involved in purine biosynthesis (2 representative co-fractionation profiles of the 69 total generated are shown).

Extended Data Figure 6

Additional validation data

a, Confirmation of MIB2 interactions by co-immunoprecipitation. Extract (~10 mg protein) from cultured human HCT116 cells expressing FLAG-tagged MIB2 or control (WT) cells was incubated with 100 μl anti-FLAG M2 resin for 4 h by gently rotating at 4°C. After extensive washing with RIPA buffer, co-purifying proteins bound to the beads were eluted by the addition of 25 μl Laemmli loading buffer at 95 °C. Polypeptides were separated by SDS-PAGE and immunoblotted using FLAG, VPS4A, VPS4B or IST1 antibodies as indicated (expanded gel images provided in SI). b, Protein co-complex interactions reported in the CYC2008 yeast protein complex database[42] are reconstructed accurately from the co-fractionation data, regardless of whether the full set of co-fractionation plus external data are used to derive protein interactions (‘All data’, see also Fig. 4b) or if the external yeast data was specifically excluded from the analyses (‘All data, excluding yeast’).

We also observed broad agreement between the derived complexes’ inferred molecular weights (assuming 1:1 stiochiometries) and migration by size exclusion chromatography (Fig. 4c; Extended Data Fig. 7a) and density gradient centrifugation (Extended Data Fig. 7b). A prime example is the coherent profiles of a large (~500 kDa) ‘mixed’ complex with several unannotated components (Fig. 4d; Extended Data Fig. 8), dubbed Commander because most subunits share COMM (copper metabolism MURR1) domains[30] implicated in copper toxicosis[31], among other roles[30,32]. Commander contains coiled-coil domain proteins CCDC22 and CCDC93 (Figs. 4a, d) in addition to ten COMM domain proteins, broadly supported by co-fractionation in human, fly and sea urchin (Extended Data Fig. 9a–c and Supporting Web Site).

Extended Data Figure 7

Agreement of derived complexes’ molecular weights with measurement by HPLC and density centrifugation

a, CORUM reference complexes’ inferred molecular weights (MW) are consistent with their components’ average cumulative size exclusion chromatograms. The molecular weights of each complex was calculated as the sum of putative component molecular weights, assuming 1:1 stoichiometry. Data from ref. [43] were analyzed as in Fig. 4c and show a similar trend as for the derived complexes. b, Derived complexes’ inferred molecular weights (MW) are broadly consistent with their components’ average cumulative ultracentrifugation profiles on a sucrose density gradient. Average profiles are plotted for X. laevis orthologs, based on a preparation of hemoglobin-depleted heart and liver proteins separated on a 7 – 47% sucrose density gradient, as described in the Extended Methods.

Extended Data Figure 8

Distribution of uncharacterized proteins and novel interactions across the 981 derived complexes

Complexes were sorted by median age (defined by OMA). Among 2,153 unique proteins, 293 (red) lack Gene Ontology (GO) functional annotations, while 1,756 of 7,665 co-complex interactions are novel (light green) (i.e., not listed in iRef curation database).

Extended Data Figure 9

Properties of the Commander complex

The automatically-derived 8 subunit Commander complex (Fig. 3b) was subsequently extended to 13 subunits (COMMD1 to 10, CCDC22, CCDC93, and SH3GLB1) based on combined analysis of AP-MS (Fig. 4a), size exclusion chromatograms[43] (Fig. 4d), published pairwise interactions[30,47,48], and analysis of elution profiles of the remaining COMM domain containing proteins, as shown here. Example protein elution profiles are plotted for Commander complex subunits observed from: a, HEK293 cell nuclear extract; b, sea urchin embryonic (5 days post-fertilization) extract; and c, fly SL2 cell nuclear extract; each fractionated by heparin affinity chromatography. d, Co-expression of Commander complex subunits during embryonic development of X. tropicalis (plotting mean +/− s.d. of 3 clutches; data from ref. [49]). e, mRNA expression patterns of Commander complex subunits in stage 15 X. laevis embryos. Images show coordinated spatial expression in early vertebrate embryogenesis, as measured by in situ hybridization (3 embryos examined). f, Knockdown of Commd2 induced marked head and eye defects in developing X. laevis. (top) Commd2 antisense knockdown significantly decreased eye size, shown for stage 38 tadpoles (from 3 clutches; control n = 47 animals, 1 eye each); phenotypes were consistent between translation blocking (MOatg; n = 60) morpholino reagents, splice site blocking (MOsp; n = 50) morpholinos, and knockdowns of interaction partner Commd3 (see Fig. 5a). ***, p < 0.0001, 2-sided Mann-Whitney test. (bottom) Commd2 knockdown induced altered Pax6 patterning in the embryonic eye (control n = 8 animals, 2 eyes each; MO n = 11). g, Commd2/3 knockdown animals show altered neural patterning. Changes in stage 15 X. laevis embryos, measured by in situ hybridization (assayed in duplicates; 5 embryos per treatment), seen upon knockdown but not on controls: the forebrain marker PAX6 was expanded, while the mid-brain marker EN2 was strongly reduced. Strikingly, while expression of KROX20/EGR1 in rhombomere R3 was shifted posteriorly, expression in R5 was strongly reduced or entirely absent. Panels in Fig. 5b are reproduced from this figure and are directly comparable. h, Confirmation of splice-blocking Commd2 morpholino activity. Images and schematic show the basis and results of RT-PCR and agarose gel electrophoresis obtained with the corresponding X. laevis knockdown tadpoles.

We found an unexpected role in embryonic development for Commander, whose subunits are strongly co-expressed in developing frog (Extended Data Fig. 9d, e). Strikingly, COMMD2/3 knockdown (morpholino) tadpoles showed impaired head and eye development (Fig. 5a; Extended Data Fig. 9f, h), and defective neural patterning and expression changes in brain markers PAX6, EN2 and KROX20/EGR1 (Fig. 5b; Extended Data Fig. 9g, h). Given CCDC22’s recent link[33,34] to human syndromes of intellectual disability, malformed cerebellum and craniofacial abnormalities, the deep conservation of the Commander complex suggests COMMD2/3 as strong candidates in the etiology of these heterogeneous disorders. Among metazoan-specific protein complexes, we confirmed physical and functional associations of spindle checkpoint protein BUB3 with ZNF207, a zinc finger protein conspicuously lacking orthologs in cnidarians and fungi. ZNF207 binds Bub3 via a Gle2-binding-sequence (GLEBS) motif[35] restricted to deuterostomes and protostomes (Extended Data Fig. 10a). As in human, knockdown of ZNF207 in C. elegans enhanced lethality due to impaired Bub3-mediated checkpoint arrest (Fig. 5c).

Extended Data Figure 10

Supporting data for BUB3 and CCDC97 experiments

a, Sequence alignment showing conservation of ZNF207 GLEBS domain. b, Targeted CRISPR/Cas9 induced knockout of CCDC97 in two independent lines of human HEK293 cells, as verified by Western blotting (expanded gel images provided in SI), also results in a slight decrease in annotated SF3B3 component levels. c, Loss of CCDC97 impairs cell growth. Lines show growth curves of control versus knockout cell lines in two biological replicate assays.

Among ‘mixed’ complexes, we confirmed metazoan-specific coiled-coil domain protein CCDC97 as a sub-stoichiometric component of human and worm SF3B spliceosomal complex involved in branch site recognition (Fig. 4a). Consistent with a possible role in pre-mRNA splicing, CRISPR-based CCDC97 knockout human cells were slower growing than control lines (Extended Data Fig. 10b, c) and hypersensitive to pladienolide B (Fig. 5d), a macrolide inhibitor of SF3b[36]. Knowledge of conserved macromolecular associations provides a roadmap for additional functional inferences. For instance, fractionation profiles can be compared for any pair of proteins in our dataset to search for evidence of interactions. Notably, we found significant enrichment for interactions among pairs of human proteins acting sequentially in annotated pathways[37] (Fig. 5e), especially G protein and MAP kinase cascades (Supplementary Table 8). Enzymes acting consecutively in core metabolic reactions (Fig. 5f) also showed a higher tendency to interact (Supplementary Table 8), whose significance decayed with more intervening steps (Fig. 5e). For example, strong consecutive interactions were apparent within the widely conserved purine biosynthetic pathway, with enzymes (e.g. PAICS, GART) eluting in two peaks (Fig. 5g), one coincident with the prior enzyme and the second with the downstream enzyme, suggestive of substrate channeling[38]. Despite the diversity of multicellular organisms, our study reveals fundamental attributes of the macromolecular machinery of animal cells with near universal pertinence to metazoan biology, development and evolution. Our massive set of supporting biochemical fractionation data (via ProteomeXchange with identifiers PXD002319-PXD002328), PPIs (via BioGRID; http://thebiogrid.org/185267/publication/panorama-of-ancient-metazoan-macromolecular-complexes.html) and interaction network projections are fully accessible (http://metazoa.med.utoronto.ca) to facilitate in-depth exploration. Although we focused on global conservation properties, these data can be analyzed at the individual animal species or complex levels to assess the variety and functional adaptations of particular protein assemblies across phyla.

Performance measures

Properties of protein elution profiles

Derivation of complexes

Properties of new and old proteins and complexes

Abundance and expression trends for proteins in complexes

Additional validation data

Agreement of derived complexes’ molecular weights with measurement by HPLC and density centrifugation

Distribution of uncharacterized proteins and novel interactions across the 981 derived complexes

Properties of the Commander complex

Supporting data for BUB3 and CCDC97 experiments

47 in total

1. Preferential attachment in the protein network evolution.

Authors: Eli Eisenberg; Erez Y Levanon
Journal: Phys Rev Lett Date: 2003-09-26 Impact factor: 9.161

Review 2. Comparative interactomics: comparing apples and pears?

Authors: Lars Kiemer; Gianni Cesareni
Journal: Trends Biotechnol Date: 2007-09-07 Impact factor: 19.536

3. Proteomics. Tissue-based map of the human proteome.

Authors: Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal: Science Date: 2015-01-23 Impact factor: 47.728

4. A protein complex network of Drosophila melanogaster.

Authors: K G Guruharsha; Jean-François Rual; Bo Zhai; Julian Mintseris; Pujita Vaidya; Namita Vaidya; Chapman Beekman; Christina Wong; David Y Rhee; Odise Cenaj; Emily McKillip; Saumini Shah; Mark Stapleton; Kenneth H Wan; Charles Yu; Bayan Parsa; Joseph W Carlson; Xiao Chen; Bhaveen Kapadia; K VijayRaghavan; Steven P Gygi; Susan E Celniker; Robert A Obar; Spyros Artavanis-Tsakonas
Journal: Cell Date: 2011-10-28 Impact factor: 41.582

5. CCDC22: a novel candidate gene for syndromic X-linked intellectual disability.

Authors: I Voineagu; L Huang; K Winden; M Lazaro; E Haan; J Nelson; J McGaughran; L S Nguyen; K Friend; A Hackett; M Field; J Gecz; D Geschwind
Journal: Mol Psychiatry Date: 2011-08-09 Impact factor: 15.992

6. A map of the interactome network of the metazoan C. elegans.

Authors: Siming Li; Christopher M Armstrong; Nicolas Bertin; Hui Ge; Stuart Milstein; Mike Boxem; Pierre-Olivier Vidalain; Jing-Dong J Han; Alban Chesneau; Tong Hao; Debra S Goldberg; Ning Li; Monica Martinez; Jean-François Rual; Philippe Lamesch; Lai Xu; Muneesh Tewari; Sharyl L Wong; Lan V Zhang; Gabriel F Berriz; Laurent Jacotot; Philippe Vaglio; Jérôme Reboul; Tomoko Hirozane-Kishikawa; Qianru Li; Harrison W Gabel; Ahmed Elewa; Bridget Baumgartner; Debra J Rose; Haiyuan Yu; Stephanie Bosak; Reynaldo Sequerra; Andrew Fraser; Susan E Mango; William M Saxton; Susan Strome; Sander Van Den Heuvel; Fabio Piano; Jean Vandenhaute; Claude Sardet; Mark Gerstein; Lynn Doucette-Stamm; Kristin C Gunsalus; J Wade Harper; Michael E Cusick; Frederick P Roth; David E Hill; Marc Vidal
Journal: Science Date: 2004-01-02 Impact factor: 47.728

7. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.

Authors: Nevan J Krogan; Gerard Cagney; Haiyuan Yu; Gouqing Zhong; Xinghua Guo; Alexandr Ignatchenko; Joyce Li; Shuye Pu; Nira Datta; Aaron P Tikuisis; Thanuja Punna; José M Peregrín-Alvarez; Michael Shales; Xin Zhang; Michael Davey; Mark D Robinson; Alberto Paccanaro; James E Bray; Anthony Sheung; Bryan Beattie; Dawn P Richards; Veronica Canadien; Atanas Lalev; Frank Mena; Peter Wong; Andrei Starostine; Myra M Canete; James Vlasblom; Samuel Wu; Chris Orsi; Sean R Collins; Shamanta Chandran; Robin Haw; Jennifer J Rilstone; Kiran Gandi; Natalie J Thompson; Gabe Musso; Peter St Onge; Shaun Ghanny; Mandy H Y Lam; Gareth Butland; Amin M Altaf-Ul; Shigehiko Kanaya; Ali Shilatifard; Erin O'Shea; Jonathan S Weissman; C James Ingles; Timothy R Hughes; John Parkinson; Mark Gerstein; Shoshana J Wodak; Andrew Emili; Jack F Greenblatt
Journal: Nature Date: 2006-03-22 Impact factor: 49.962

8. Splicing factor SF3b as a target of the antitumor natural product pladienolide.

Authors: Yoshihiko Kotake; Koji Sagane; Takashi Owa; Yuko Mimori-Kiyosue; Hajime Shimizu; Mai Uesugi; Yasushi Ishihama; Masao Iwata; Yoshiharu Mizui
Journal: Nat Chem Biol Date: 2007-07-22 Impact factor: 15.040

9. How complete are current yeast and human protein-interaction networks?

Authors: G Traver Hart; Arun K Ramani; Edward M Marcotte
Journal: Genome Biol Date: 2006 Impact factor: 13.583

10. The Reactome pathway knowledgebase.

Authors: David Croft; Antonio Fabregat Mundo; Robin Haw; Marija Milacic; Joel Weiser; Guanming Wu; Michael Caudy; Phani Garapati; Marc Gillespie; Maulik R Kamdar; Bijay Jassal; Steven Jupe; Lisa Matthews; Bruce May; Stanislav Palatnik; Karen Rothfels; Veronica Shamovsky; Heeyeon Song; Mark Williams; Ewan Birney; Henning Hermjakob; Lincoln Stein; Peter D'Eustachio
Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971

193 in total

Review 1. Complex Homology and the Evolution of Nervous Systems.

Authors: Benjamin J Liebeskind; David M Hillis; Harold H Zakon; Hans A Hofmann
Journal: Trends Ecol Evol Date: 2015-12-30 Impact factor: 17.712

2. SYSTEMS BIOLOGY: Ancient protein complexes revealed.

Authors: Allison Doerr
Journal: Nat Methods Date: 2015-11 Impact factor: 28.547

3. BraInMap Elucidates the Macromolecular Connectivity Landscape of Mammalian Brain.

Authors: Reza Pourhaghighi; Peter E A Ash; Sadhna Phanse; Florian Goebels; Lucas Z M Hu; Siwei Chen; Yingying Zhang; Shayne D Wierbowski; Samantha Boudeau; Mohamed T Moutaoufik; Ramy H Malty; Edyta Malolepsza; Kalliopi Tsafou; Aparna Nathan; Graham Cromar; Hongbo Guo; Ali Al Abdullatif; Daniel J Apicco; Lindsay A Becker; Aaron D Gitler; Stefan M Pulst; Ahmed Youssef; Ryan Hekman; Pierre C Havugimana; Carl A White; Benjamin C Blum; Antonia Ratti; Camron D Bryant; John Parkinson; Kasper Lage; Mohan Babu; Haiyuan Yu; Gary D Bader; Benjamin Wolozin; Andrew Emili
Journal: Cell Syst Date: 2020-04-22 Impact factor: 10.304

Review 4. Pathway perturbations in signaling networks: Linking genotype to phenotype.

Authors: Yongsheng Li; Daniel J McGrail; Natasha Latysheva; Song Yi; M Madan Babu; Nidhi Sahni
Journal: Semin Cell Dev Biol Date: 2018-05-10 Impact factor: 7.727

5. Novel methods for integration and visualization of genomics and genetics data in Alzheimer's disease.

Authors: Nathan A Bihlmeyer; Emily Merrill; Yann Lambert; Gyan P Srivastava; Timothy W Clark; Bradley T Hyman; Sudeshna Das
Journal: Alzheimers Dement Date: 2019-03-29 Impact factor: 21.566

6. Profiling the Escherichia coli membrane protein interactome captured in Peptidisc libraries.

Authors: Michael Luke Carlson; R Greg Stacey; John William Young; Irvinder Singh Wason; Zhiyu Zhao; David G Rattray; Nichollas Scott; Craig H Kerr; Mohan Babu; Leonard J Foster; Franck Duong Van Hoa
Journal: Elife Date: 2019-07-31 Impact factor: 8.140

7. Analysis of Human Nuclear Protein Complexes by Quantitative Mass Spectrometry Profiling.

Authors: Katelyn E Connelly; Victoria Hedrick; Tiago Jose Paschoal Sobreira; Emily C Dykhuizen; Uma K Aryal
Journal: Proteomics Date: 2018-05-04 Impact factor: 3.984

8. Rare Disease Mechanisms Identified by Genealogical Proteomics of Copper Homeostasis Mutant Pedigrees.

Authors: Stephanie A Zlatic; Alysia Vrailas-Mortimer; Avanti Gokhale; Lucas J Carey; Elizabeth Scott; Reid Burch; Morgan M McCall; Samantha Rudin-Rush; John Bowen Davis; Cortnie Hartwig; Erica Werner; Lian Li; Michael Petris; Victor Faundez
Journal: Cell Syst Date: 2018-01-31 Impact factor: 10.304

9. Zc3h13 Regulates Nuclear RNA m⁶A Methylation and Mouse Embryonic Stem Cell Self-Renewal.

Authors: Jing Wen; Ruitu Lv; Honghui Ma; Hongjie Shen; Chenxi He; Jiahua Wang; Fangfang Jiao; Hang Liu; Pengyuan Yang; Li Tan; Fei Lan; Yujiang Geno Shi; Chuan He; Yang Shi; Jianbo Diao
Journal: Mol Cell Date: 2018-03-15 Impact factor: 17.970

10. A Pilot Study Using a Multistaged Integrated Analysis of Gene Expression and Methylation to Evaluate Mechanisms for Evening Fatigue in Women Who Received Chemotherapy for Breast Cancer.

Authors: Elena Flowers; Annesa Flentje; Jon Levine; Adam Olshen; Marilyn Hammer; Steven Paul; Yvette Conley; Christine Miaskowski; Kord M Kober
Journal: Biol Res Nurs Date: 2019-01-31 Impact factor: 2.522