Literature DB >> 30357367

CORUM: the comprehensive resource of mammalian protein complexes-2019.

Madalina Giurgiu¹, Julian Reinhard¹, Barbara Brauner¹, Irmtraud Dunger-Kaltenbach¹, Gisela Fobo¹, Goar Frishman¹, Corinna Montrone¹, Andreas Ruepp¹.

Abstract

CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (67%), mouse (15%) and rat (10%). Given the vital functions of these macromolecular machines, their identification and functional characterization is foundational to our understanding of normal and disease biology. The new CORUM 3.0 release encompasses 4274 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 4473 different genes, representing 22% of the protein coding genes in humans. Protein complexes are described by a protein complex name, subunit composition, cellular functions as well as the literature references. Information about stoichiometry of subunits depends on availability of experimental data. Recent developments include a graphical tool displaying known interactions between subunits. This allows the prediction of structural interconnections within protein complexes of unknown structure. In addition, we present a set of 58 protein complexes with alternatively spliced subunits. Those were found to affect cellular functions such as regulation of apoptotic activity, protein complex assembly or define cellular localization. CORUM is freely accessible at http://mips.helmholtz-muenchen.de/corum/.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 30357367 PMCID： PMC6323970 DOI： 10.1093/nar/gky973

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Understanding biological processes at cellular and system levels is an important task in all living organisms. Protein complexes play critical roles in an array of biological processes, including protein synthesis, signaling and cellular degradation processes. To date, there are no reliable estimates about the total number of protein complexes in cells (complexome), but data from single cell organisms provide evidence that more than half of the gene products are involved in the formation of protein complexes (1). According to estimates from Berggård et al. (2), even more than 80% of proteins work in complexes. Many proteins are subunits of more than one complex, which extends the number of potential protein complexes. The RING-box protein 1 (RBX1), that was present in 35 protein complexes in the 2009 release of CORUM (3), is now found in 65 protein complexes. Due to the importance of the topic, several endeavors were undertaken in order to unravel the cellular complexome. The first large-scale screens for protein complexes were performed in budding yeast (4,5) and discovered 491 and 547 complexes, respectively. Recent analyses of the interactome/complexome in human cells revealed a wealth of novel information (6–8). Integration of published datasets in the human protein complex map (hu.MAP) resulted in 4659 complexes composed from 7777 unique proteins (9). For many years, the composition of thousands of protein complexes has been analysed in individual experiments and published in the scientific literature. In addition to the identification of subunit composition, which is the standard in high-throughput experiments, these complexes are also characterized with respect to their cellular function, association with diseases and sometimes stoichiometry. In order to provide a high-quality resource of information on mammalian protein complexes, we generated the comprehensive resource of mammalian protein complexes (CORUM) with 1750 complexes in the first release (10). The CORUM release 3.0 presents a significantly extended dataset that now consists of 4274 mammalian protein complexes. A graphical analysis tool was implemented that displays potential protein–protein interactions between the subunits of protein complexes. The tool is based on the Cytoscape javascript library and uses data of mammalian organisms from the IntAct database. At last, we present a collection of 58 protein complexes, where alternative splice variants result in altered complex function or affect diseases. CORUM is freely accessible at http://mips.helmholtz-muenchen.de/corum/.

DATABASE DESCRIPTION AND NEW DEVELOPMENTS

Design and application of CORUM

Our goal was the generation of a reference dataset of protein complex information from mammalian organisms. In order to obtain a high quality and reliability of the data, we only include protein complexes that have been isolated and characterized in individual experiments. Although protein complex information from high-throughput approaches provides an invaluable resource of novel information, we do not include it in CORUM as these data are usually not corroborated by experiments unraveling the biological function of complexes. As of note, our novel core set from CORUM 3.0 (3512 protein complexes) and the hu.MAP dataset share only 29 identical protein complexes (9). Experienced biocurators critically extract information from the scientific literature and transfer it into CORUM using established vocabularies and stable identifiers from well-known resources such as UniProt and Gene Ontology. The major focus for the application of CORUM lies on network biology and network medicine. Hence, the dataset is biased toward protein complexes which consist of at least two different subunits. According to Teichmann (11), the majority of (known) protein complexes in PDB are homomers. Based on recent statistics from PDB, 9206 homomers and 2677 heteromeric protein structures have been determined using X-ray crystallography. Of note, these data cover all groups of organisms. In the UniProt release 06_2018, 2344 proteins from the taxon Homo sapiens were annotated either as homodimer or homotetramer (12). In contrast, CORUM includes only 126 homomers. Apart from this exception, CORUM aims to offer a representation of the complexome of mammalian cells. Since the previous CORUM release, basic annotation topics such as protein complex name, identification of subunits based on UniProt identifiers, references of used articles and comments remained unchanged. For functional annotation of protein complexes, we offer a mapping to Gene Ontology (GO) terms since CORUM 2.0. In the meantime, we switched manual functional annotation of novel complexes completely to GO terms (Figure 1).

Figure 1.

Annotation of protein complexes in CORUM. Presentation of the Ubiquitin E3 ligase (CUL3, KLHL20, RBX1) complex in CORUM. Information about functional annotation using term from Gene Ontology is automatically translated into respective terms from the FunCat annotation scheme. Compared to the previous CORUM publication (3), the content of the dataset has increased from 2837 to 4274 protein complexes (Figure 2). In particular, the core set has grown considerably. The core set is a subset of CORUM that is reduced by eliminating redundant information such as protein complexes that were characterized from different mammalian species or complexes which were isolated by different methods. The core set now includes 3512 protein complexes, which is a gain of 70% compared to CORUM 2.0. Regarding the organisms that were used to characterize protein complexes, there are only minor changes comparing to the previous CORUM publication. The largest fraction of complexes was isolated from Homo sapiens (67%) followed by mouse (15%) and rat (10%). Other complexes were isolated from organisms such as cattle, pig, rabbit or are composed of subunits from different organisms. The growth of the CORUM dataset is also accompanied by a higher total number of different gene products that are used as protein complex subunits. While in CORUM 2.0 complexes were composed of 3189 different proteins, the number now increased to 4473. This represents 22% of all known protein-coding genes according to the Ensembl database version 92.38 (13).

Figure 2.

Data growth in CORUM. The plot compares the data content of CORUM versions 1.0, 2.0 and 3.0. It includes the number of articles that were used to create the datasets, the number of core set complexes, the total number of protein complexes as well as the total number of different proteins that are found in the dataset. As we did not provide a core set in CORUM 1.0, a respective number is missing. In recent years, the CORUM dataset was used in a large number of analyses as reference dataset, for benchmarking computational models and high-throughput experimental data. Examples are results from three pioneering high-throughput investigations that were combined in order to present a comprehensive complexome of human cells (9). In addition, protein complex information from CORUM is increasingly applied for the analyses of disease mechanisms. Use cases are network-based in silico drug efficacy screening (14), pituitary hormone deficiency in inherited gingival fibromatosis (15) or the architecture of human protein communities and disease networks (16). In the area of diseases, the CORUM dataset is particularly often applied in cancer analyses, demonstrated by more than ten citations during the past 2 years. These include protein-interaction network associated analyses of MLL(KMT2A)-fusion proteins in leukemia (17) and the detection of dysregulated protein-association networks in breast cancer cell lines (18). Beside the analysis of experimental data, CORUM is also used by other data resources such as ‘Interactome INSIDER’ (19) or the ‘MouseNet v2’ database of gene networks (20). At last, CORUM is cross-referenced by the UCSC-Genome browser (21) and UniProt (12).

Annotation of splice variant complexes and their functional implications

To generate a high number of different gene products from comparably small genomes, mammals make use of processes such as alternative splicing to produce more than 82 335 distinct mRNAs e.g. in humans, according to the Gencode release 28 (22). Accordingly, mammalian protein complexes also exist in different isoforms, with the composition varying across cell types and conditions (23). In our new CORUM release, we provide a dataset of 58 protein complexes, which contain alternatively spliced subunits. Those subunits were found to alter cellular functions such as methylation of nucleosomal histone H3-lysine 27 by the PCM3 complex (with EED isoform 3) whereas the PRC2 complex (with EED isoform 1) preferentially methylates nucleosomal histone H1-lysine 26 (24). Examples, where alternative splicing affects protein complex functions via altered protein binding are the caspase-2 gene products. The short isoform, CASP-2S, inhibits apoptosis, whereas the long isoform, CASP-2L, promotes apoptosis. The CASP-2S–fodrin complex inhibits DNA damage-induced cytoplasmic fodrin cleavage. This process inhibits membrane blebbing and phosphatidylserine externalization that are indicative of apoptosis in cancer cells. The molecular basis for the different activities of the two CASP variants is that, in contrast to CASP-2S, the long isoform CASP-2L does not interact with fodrin (25). The effect of alternatively spliced subunits of cytoskeletal protein dystrophin can be illustrated for DAPCs (dystrophin-associated protein complexes). Duchenne muscular dystrophy is caused by the absence of dystrophin (26). The DAPC is destabilized when dystrophin is absent, which leads to downregulated levels of the member proteins (27). Two dystrophin 71 isoforms (Dp71d and Dp71f) form multi-protein complexes in the hippocampal neurons. Dp71d-DAPC is mainly localized in bipolar GABAergic and Dp71f–DAPC in multipolar glutamatergic hippocampal neurons. The subunit composition of these protein complexes seems to affect their neuronal phenotype (28). It has been described that Dp71d–DAP complexes were present only in the nuclei of non-neuronal cells (29). However, recently it was demonstrated for the first time that isoform-containing complexes Dp71f–DAP and Dp71fd–DAP were localized in the nucleus of primary hippocampal neurons (28). This example shows that alternative splicing variants as subunits may regulate not only specific activities but also the tissue-specific localization of protein complexes. The splice variant dataset illustrates that functional properties of particular protein complexes can only be represented by isoform-specific variant annotation. The splice variant complexes can be downloaded as a separate dataset.

Prediction of protein–protein interactions within protein complexes

Although there is a growing number of protein complexes with structure information, characterization of complexes is usually restricted to the identification of the subunits and biomedical information such as cellular function or association with diseases. Structural information such as stoichiometry of individual subunits is rarely discovered. In articles used for CORUM, we found only 288 protein complexes with stoichiometry data. For detailed structural information discovered by x-ray crystallography or nuclear magnetic resonance, information is even more sparse. In the new CORUM dataset 3.0, we have annotated 109 protein complexes with the PSI–MI interaction method, MI:0114 x-ray crystallography’. Preliminary information about protein complex structure with respect to neighboring proteins can be performed by serial protein interaction experiments. This was successfully applied for elucidation of the chaperonin Tric/CCT (30). In order to provide users with the option to inspect putative interactions between complex subunits, the new CORUM release provides a graphical tool that integrates known PPI information. A large publicly available collection of protein–protein interactions is provided by the IntAct database which includes data from different resources (31). For the CORUM tool, we used the ‘physical association’-interactions of IntAct, release 05 November 2018, from the organisms Homo sapiens, Rattus norvegicus and Mus musculus. The largest fraction of PPI information was obtained from Homo sapiens (238 841 entries), followed by mouse (15 138) and rat (3962). For visualization of interactions between protein complex subunits we use Cytoscape. Cytoscape.js is a javascript-based graph-visualization library that we embedded in version 3.2 on our website. For each protein complex that contains at least one interaction we show all protein–protein interactions according to the IntAct dataset in a graph below the list of search results. For Fanconi anemia FAAP100 complex (complex 6884) for example, nine interactions between subunits and two self-associations were found (Figure 3). Five interactions between FANCA, FANCC, FANCF and FANCG give rise to the speculation that these four proteins build a core of the complex. This is corroborated by analyses of four forms of the Fanconi anaemia core complexes from different subcellular compartments (32). All forms of the complex contained at least these four proteins.

Figure 3.

Protein–protein interactions between protein complex subunits. Based on data from the IntAct database, validated protein–protein interactions of the Fanconi anemia FAAP100 complex (complex 6884) are displayed with Cytoscape. The fact, that a protein–protein interaction was discovered in an experiment is no proof that it also exists in a protein complex. However, it is tempting to assume that for the majority of protein complexes, PPI data provide a reliable prediction of intramolecular associations.

CONCLUSION

CORUM is a publicly available, centralized database for mammalian protein complexes based on manually curated information from scientific literature. Basic and translational researchers are provided with extensive search options to look for complexes containing their genes of interest, exhibiting a specific biological function or displaying other features. Computational biologists can download entire datasets in different formats for advanced studies. The importance of protein complex data is demonstrated by hundreds of studies that used the CORUM dataset during the last decade. These include basic research such as the inventory of the mammalian complexome as well as applied biomedical research, in particular of cancer. The CORUM 3.0 release provides a substantially enlarged dataset of mammalian complexes which is accompanied by a wider coverage of gene products that serve as subunits of protein complexes. In recent years, a growing number of studies has demonstrated that variants of gene products may have substantial effects on protein complex function. Here, we present for the first time a dataset that covers the impact of splice variants on cellular processes and diseases. In addition to the larger dataset, the CORUM 3.0 release also presents a Cytoscape-based tool for the graphical representation of known protein–protein interactions of subunits. As structural information about large protein complexes is sparse, this approach allows the prediction of structurally adjacent proteins within these large complexes. An important goal of CORUM for the future will be to obtain an even more complete representation of experimentally characterized protein complexes. To achieve this goal, we welcome the input of other researchers sending us information about novel published protein complexes not yet included in the dataset. Please contact us at andreas.ruepp@helmholtz-muenchen.de.

32 in total

1. The Fanconi anemia core complex forms four complexes of different sizes in different subcellular compartments.

Authors: Andrei Thomashevski; Anthony A High; Mary Drozd; Jeffrey Shabanowitz; Donald F Hunt; Patrick A Grant; Gary M Kupfer
Journal: J Biol Chem Date: 2004-04-13 Impact factor: 5.157

2. Proteome survey reveals modularity of the yeast cell machinery.

Authors: Anne-Claude Gavin; Patrick Aloy; Paola Grandi; Roland Krause; Markus Boesche; Martina Marzioch; Christina Rau; Lars Juhl Jensen; Sonja Bastuck; Birgit Dümpelfeld; Angela Edelmann; Marie-Anne Heurtier; Verena Hoffman; Christian Hoefert; Karin Klein; Manuela Hudak; Anne-Marie Michon; Malgorzata Schelder; Markus Schirle; Marita Remor; Tatjana Rudi; Sean Hooper; Andreas Bauer; Tewis Bouwmeester; Georg Casari; Gerard Drewes; Gitte Neubauer; Jens M Rick; Bernhard Kuster; Peer Bork; Robert B Russell; Giulio Superti-Furga
Journal: Nature Date: 2006-01-22 Impact factor: 49.962

Review 3. Methods for the detection and analysis of protein-protein interactions.

Authors: Tord Berggård; Sara Linse; Peter James
Journal: Proteomics Date: 2007-08 Impact factor: 3.984

4. CORUM: the comprehensive resource of mammalian protein complexes--2009.

Authors: Andreas Ruepp; Brigitte Waegele; Martin Lechner; Barbara Brauner; Irmtraud Dunger-Kaltenbach; Gisela Fobo; Goar Frishman; Corinna Montrone; H-Werner Mewes
Journal: Nucleic Acids Res Date: 2009-11-01 Impact factor: 16.971

5. Nuclear and nuclear envelope localization of dystrophin Dp71 and dystrophin-associated proteins (DAPs) in the C2C12 muscle cells: DAPs nuclear localization is modulated during myogenesis.

Authors: R González-Ramírez; S L Morales-Lázaro; V Tapia-Ramírez; D Mornet; B Cisneros
Journal: J Cell Biochem Date: 2008-10-15 Impact factor: 4.429

6. Different EZH2-containing complexes target methylation of histone H1 or nucleosomal histone H3.

Authors: Andrei Kuzmichev; Thomas Jenuwein; Paul Tempst; Danny Reinberg
Journal: Mol Cell Date: 2004-04-23 Impact factor: 17.970

7. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.

Authors: Nevan J Krogan; Gerard Cagney; Haiyuan Yu; Gouqing Zhong; Xinghua Guo; Alexandr Ignatchenko; Joyce Li; Shuye Pu; Nira Datta; Aaron P Tikuisis; Thanuja Punna; José M Peregrín-Alvarez; Michael Shales; Xin Zhang; Michael Davey; Mark D Robinson; Alberto Paccanaro; James E Bray; Anthony Sheung; Bryan Beattie; Dawn P Richards; Veronica Canadien; Atanas Lalev; Frank Mena; Peter Wong; Andrei Starostine; Myra M Canete; James Vlasblom; Samuel Wu; Chris Orsi; Sean R Collins; Shamanta Chandran; Robin Haw; Jennifer J Rilstone; Kiran Gandi; Natalie J Thompson; Gabe Musso; Peter St Onge; Shaun Ghanny; Mandy H Y Lam; Gareth Butland; Amin M Altaf-Ul; Shigehiko Kanaya; Ali Shilatifard; Erin O'Shea; Jonathan S Weissman; C James Ingles; Timothy R Hughes; John Parkinson; Mark Gerstein; Shoshana J Wodak; Andrew Emili; Jack F Greenblatt
Journal: Nature Date: 2006-03-22 Impact factor: 49.962

8. GENCODE: the reference human genome annotation for The ENCODE Project.

Authors: Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal: Genome Res Date: 2012-09 Impact factor: 9.043

9. CYGD: the Comprehensive Yeast Genome Database.

Authors: U Güldener; M Münsterkötter; G Kastenmüller; N Strack; J van Helden; C Lemer; J Richelles; S J Wodak; J García-Martínez; J E Pérez-Ortín; H Michael; A Kaps; E Talla; B Dujon; B André; J L Souciet; J De Montigny; E Bon; C Gaillardin; H W Mewes
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. CORUM: the comprehensive resource of mammalian protein complexes.

Authors: Andreas Ruepp; Barbara Brauner; Irmtraud Dunger-Kaltenbach; Goar Frishman; Corinna Montrone; Michael Stransky; Brigitte Waegele; Thorsten Schmidt; Octave Noubibou Doudieu; Volker Stümpflen; H Werner Mewes
Journal: Nucleic Acids Res Date: 2007-10-26 Impact factor: 16.971

173 in total

1. A New Framework for Discovering Protein Complex and Disease Association via Mining Multiple Databases.

Authors: Lei Xue; Xu-Qing Tang
Journal: Interdiscip Sci Date: 2021-04-27 Impact factor: 2.233

2. BpForms and BcForms: a toolkit for concretely describing non-canonical polymers and complexes to facilitate global biochemical networks.

Authors: Paul F Lang; Yassmine Chebaro; Xiaoyue Zheng; John A P Sekar; Bilal Shaikh; Darren A Natale; Jonathan R Karr
Journal: Genome Biol Date: 2020-05-18 Impact factor: 13.583

3. The Global Phosphorylation Landscape of SARS-CoV-2 Infection.

Authors: Mehdi Bouhaddou; Danish Memon; Bjoern Meyer; Kris M White; Veronica V Rezelj; Miguel Correa Marrero; Benjamin J Polacco; James E Melnyk; Svenja Ulferts; Robyn M Kaake; Jyoti Batra; Alicia L Richards; Erica Stevenson; David E Gordon; Ajda Rojc; Kirsten Obernier; Jacqueline M Fabius; Margaret Soucheray; Lisa Miorin; Elena Moreno; Cassandra Koh; Quang Dinh Tran; Alexandra Hardy; Rémy Robinot; Thomas Vallet; Benjamin E Nilsson-Payant; Claudia Hernandez-Armenta; Alistair Dunham; Sebastian Weigang; Julian Knerr; Maya Modak; Diego Quintero; Yuan Zhou; Aurelien Dugourd; Alberto Valdeolivas; Trupti Patil; Qiongyu Li; Ruth Hüttenhain; Merve Cakir; Monita Muralidharan; Minkyu Kim; Gwendolyn Jang; Beril Tutuncuoglu; Joseph Hiatt; Jeffrey Z Guo; Jiewei Xu; Sophia Bouhaddou; Christopher J P Mathy; Anna Gaulton; Emma J Manners; Eloy Félix; Ying Shi; Marisa Goff; Jean K Lim; Timothy McBride; Michael C O'Neal; Yiming Cai; Jason C J Chang; David J Broadhurst; Saker Klippsten; Emmie De Wit; Andrew R Leach; Tanja Kortemme; Brian Shoichet; Melanie Ott; Julio Saez-Rodriguez; Benjamin R tenOever; R Dyche Mullins; Elizabeth R Fischer; Georg Kochs; Robert Grosse; Adolfo García-Sastre; Marco Vignuzzi; Jeffery R Johnson; Kevan M Shokat; Danielle L Swaney; Pedro Beltrao; Nevan J Krogan
Journal: Cell Date: 2020-06-28 Impact factor: 41.582

4. A Genetic Map of the Response to DNA Damage in Human Cells.

Authors: Michele Olivieri; Tiffany Cho; Alejandro Álvarez-Quilón; Kejiao Li; Matthew J Schellenberg; Michal Zimmermann; Nicole Hustedt; Silvia Emma Rossi; Salomé Adam; Henrique Melo; Anne Margriet Heijink; Guillermo Sastre-Moreno; Nathalie Moatti; Rachel K Szilard; Andrea McEwan; Alexanda K Ling; Almudena Serrano-Benitez; Tajinder Ubhi; Sumin Feng; Judy Pawling; Irene Delgado-Sainz; Michael W Ferguson; James W Dennis; Grant W Brown; Felipe Cortés-Ledesma; R Scott Williams; Alberto Martin; Dongyi Xu; Daniel Durocher
Journal: Cell Date: 2020-07-09 Impact factor: 41.582

Review 10. Next-generation Interactomics: Considerations for the Use of Co-elution to Measure Protein Interaction Networks.

Authors: Daniela Salas; R Greg Stacey; Mopelola Akinlaja; Leonard J Foster
Journal: Mol Cell Proteomics Date: 2019-12-02 Impact factor: 5.911