Alice Bossi1, Ben Lehner. 1. EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation, UPF, Barcelona, Spain.
Abstract
A protein interaction network describes a set of physical associations that can occur between proteins. However, within any particular cell or tissue only a subset of proteins is expressed and so only a subset of interactions can occur. Integrating interaction and expression data, we analyze here this interplay between protein expression and physical interactions in humans. Proteins only expressed in restricted cell types, like recently evolved proteins, make few physical interactions. Most tissue-specific proteins do, however, bind to universally expressed proteins, and so can function by recruiting or modifying core cellular processes. Conversely, most 'housekeeping' proteins that are expressed in all cells also make highly tissue-specific protein interactions. These results suggest a model for the evolution of tissue-specific biology, and show that most, and possibly all, 'housekeeping' proteins actually have important tissue-specific molecular interactions.
A protein interaction network describes a set of physical associations that can occur between proteins. However, within any particular cell or tissue only a subset of proteins is expressed and so only a subset of interactions can occur. Integrating interaction and expression data, we analyze here this interplay between protein expression and physical interactions in humans. Proteins only expressed in restricted cell types, like recently evolved proteins, make few physical interactions. Most tissue-specific proteins do, however, bind to universally expressed proteins, and so can function by recruiting or modifying core cellular processes. Conversely, most 'housekeeping' proteins that are expressed in all cells also make highly tissue-specific protein interactions. These results suggest a model for the evolution of tissue-specific biology, and show that most, and possibly all, 'housekeeping' proteins actually have important tissue-specific molecular interactions.
Nearly all processes in biology are dependent on the precise physical interactions among many individual proteins. These range from the maintenance of cellular architecture and the propagation of the genetic material, to the ability of cells to process and respond to environmental information. Defining a near-complete map of the physical interactions that can occur between human proteins—the human protein ‘interactome'—is an important ambition of current research. Similar to the sequence of the human genome, the human interactome serves as a resource for researchers and can be used to understand how proteins are organized to perform functions within a cell (Bork ; Cusick ).Protein interactome mapping projects were pioneered in model organisms (Uetz ; Walhout ; Ito ; Ho ; Li ; Gavin ; Krogan ), with initial efforts in humans focused on particular pathways or genomic regions (Bouwmeester ; Lehner and Sanderson, 2004; Lehner ; Jeronimo ). More recently, the cloning of large sets of human open reading frames and improvements in interaction assays have allowed these efforts to be expanded by an order of magnitude to the scale of the human proteome (Rual ; Stelzl ; Ewing ). These data, combined with extensive efforts to collate known interactions from the scientific literature (Bader ; Xenarios ; Pagel ; Persico ; Stark ; Kerrien ; Vastrik ; Ruepp ), mean that there is now a reasonably extensive resource of known human protein interactions (Hart ).A global interactome network provides an overview of all of the physical interactions that can occur between human proteins. However, very little is known about when and where each of these interactions can occur. Within any particular cell or tissue of the human body not all protein interactions can occur. Most simply, if two genes are not expressed in a cell, then an interaction between their protein products cannot occur.In unicellular organisms, one approach that has been used to investigate the dynamics of interaction networks between cellular states has been to integrate interactome data with expression data. This approach has been used to identify co-regulated interaction modules (Ihmels ; Komurov and White, 2007) or to investigate the relationships between interaction network topology and gene co-expression (Han ). Additional studies have used gene expression (Luscombe ; de Lichtenberg ) or functional information (Rachlin ) to investigate the cellular conditions (or ‘context') under which interactions can occur, and to distinguish between condition-dependent and condition-independent interactions.In the present study, we apply a similar approach to the human protein interaction network, using global gene expression data to identify the human cells and tissues in which each interaction can or cannot occur. By performing this analysis, we are able to investigate the relationship between the tissue specificity of a protein and its number of interaction partners. Moreover, and strikingly, we find extensive communication between universally expressed proteins and those with tissue-specific expression. Even the most tissue-specific proteins normally interact directly with components of the core cellular machinery. Conversely, nearly all universally expressed ‘housekeeping' proteins have protein interactions that can only occur in a restricted subset of cells. Our results suggest a model for the evolution of tissue-specific functions through the modification and re-use of core cellular processes, and that most ‘housekeeping' proteins should probably be considered as important for tissue-specific processes.
Results
Construction of a global human protein interaction network
To construct a global human physical protein interaction network, we integrated data from 21 different sources to define a network of 80 922 physical interactions that can occur between 10 229 human proteins. We only included interactions supported by at least one piece of direct experimental evidence demonstrating physical association between two human proteins (see Materials and methods; Supplementary Table 1). Moreover, to account for differences in interaction assay reliability, throughout this work, we also consider a high-confidence subset of this global network that consists of interactions reported in at least two independent primary research publications. There are a total of 13 102 of these multiple publication-supported interactions that connect 4750 human proteins.
Determining the tissue specificity of human protein interactions
We then used gene expression data (Su ) to determine the cells and tissues of the human body in which each of these interactions can occur (Figure 1A). If two genes are co-expressed in a cell, then under some condition their products can physically interact in that cell. However, if two proteins are not expressed in a tissue, then the interaction cannot occur in this tissue. The complete set of interactions, their supporting evidence, and the cells and tissues in which each interaction can occur are provided as Supplementary Table 1 as a resource for researchers interested in the biology of any particular human cell or tissue.
Figure 1
Tissue-specific and recently evolved proteins make few protein interactions. (A) Integrating protein interaction and expression data to construct ‘local' interactomes for human cells and tissues. (B, C) The relationship between protein interaction degree and protein expression breadth (the number of tissues in which a protein is expressed) for the complete human protein interaction network (B), and (C) for ancestral (pre-metazoan) proteins (blue) and for metazoan-specific proteins (red). P<10e−15 in all cases, Kolmogorov–Smirnov test. Bars indicate one standard error. Interaction degree is the maximum number of co-expressed interaction partners. The same analysis is performed for the multiple-support network and for a network without protein complex-derived interactions in Supplementary Figure 1.
Tissue specific and recently evolved proteins make few protein interactions
We first examined the relationship between the tissue specificity of a protein and the number of interactions that it makes (a protein's interaction degree). We find that more tissue-specific proteins make fewer interactions than widely expressed proteins (Figure 1B, Spearman's rho=0.19, P<2.2e−16). This is true both for the complete and for the multiple-support interaction dataset (Supplementary Figure 1A), and when excluding all protein complexes (Supplementary Figure 1B). It has been shown earlier that tissue-specific proteins are more likely to be recent evolutionary innovations than universally expressed proteins (Lehner and Fraser, 2004b). We find that more-recently evolved proteins have fewer interactions than ancient proteins, but that the relationship between tissue specificity and interaction degree is seen for both sets of proteins (Figure 1C). That is, the older a protein is, and the more tissues in which it is expressed, the more protein interactions it is likely to have.
The most tissue-specific proteins normally interact with core cellular components
We next analyzed the extent to which tissue-specific proteins interact with the most widely expressed proteins. We find that even when only considering the most tissue-restricted proteins (proteins expressed in ⩽10/79 tissues), most of them are known to interact directly with universally expressed human proteins (Figure 2A). The same result is seen when only considering high-confidence human protein interactions (Supplementary Figure 2A), and when using diverse definitions of universally expressed proteins (Figure 2A). Thus, most tissue-specific proteins can function by directly contacting components of the core cellular machinery.
Figure 2
Most tissue-specific proteins interact with core cellular components, and most housekeeping proteins have tissue-specific physical interactions. (A) The proportion of the most tissue-specific proteins (proteins expressed in only 1–10/79 tissues) that interact with universally expressed housekeeping proteins. (B) The percentage of housekeeping proteins that interact with non-housekeeping proteins. These data are for the complete network. The same analysis is shown for the high-confidence multiple-support network in Supplementary Figure 2. Housekeeping proteins are defined by 10 criteria: (1) this study 79/79 tissues, (2) this study 71–79 tissues, (3) this study 79/79 tissues with reduced expression stringency, (4) this study 71–79 tissues with reduced stringency, (5) this study 79/79 tissues with increased stringency, (6) this study 71–79 tissues with increased stringency, (7) Zhu et al microarray data 18/18 tissues, (8) Zhu et al microarray data 16–18 tissues, (9) Zhu et al EST data 18/18 tissues, (10) Zhu et al EST data 16–18 tissues (Zhu ). (C) Many proteins make interactions that can only occur in a subset of the tissues in which they are expressed. The number of tissues in which the interactions of a protein can occur is compared with the number of tissues in which a protein is expressed for proteins falling into each of the eight bins of tissue specificity. Data are shown for the complete network. Data for the filtered multiple-support network and reduced and increased stringency expression thresholds are shown in Supplementary Figure 3.
Most universally expressed proteins have tissue-specific protein interactions
Constitutively expressed proteins are often considered as important for ‘housekeeping' biological processes that are required in all cells. However, nearly all of the most widely expressed proteins have interactions with other proteins that are not themselves universally expressed (Figure 2B). That is, most universally expressed proteins have physical interactions that can only occur in a restricted subset of cells and tissues. The same result is seen when using the complete interaction dataset, when only considering high-confidence interactions described in multiple independent publications (Supplementary Figure 2B), or when using diverse definitions of universally expressed proteins (Figure 2B). Thus most, and possibly all, universally expressed proteins have tissue-specific molecular interactions.Proteins that themselves have restricted expression patterns also have many interactions that can only occur in a subset of the tissues in which they are expressed (Figure 2C). That is, as a consequence of interactions between more and less widely expressed proteins, human protein interactions are often more tissue specific than proteins (P<10−16).
Extensive re-use of housekeeping proteins for tissue-specific biological processes
To further illustrate how housekeeping proteins are widely re-used for tissue-specific biological processes, we considered neuronal protein complexes that function in synaptic transmission, learning, and memory. The subunits of these complexes have been identified by extensive proteomic approaches, and the importance of individual subunits for learning and memory have been validated by genetic studies in mice and by clinical studies in humans (Pocklington ). We estimate that ∼20–60% of the subunits of these neuronal-specific complexes are actually universally expressed housekeeping proteins (Figure 3A and B). Moreover, in ∼30% of cases, these housekeeping subunits have genetically verified roles in learning and memory (Figure 3C). Thus, universally expressed proteins, through their tissue-specific interactions, can be re-used and essential for highly tissue-specific biological processes.
Figure 3
The re-use of housekeeping proteins for tissue-specific functions. Here we use the example of neurotransmitter receptor protein complexes identified by affinity purification followed by mass spectrometry (Pocklington ). (A) A section of the binary protein interaction network of neurotransmitter receptor complexes, with subunits marked as universally expressed (housekeeping) proteins (yellow) and non-housekeeping (blue). The housekeeping and non-housekeeping interaction partners of the housekeeping protein Rac1 are highlighted and labeled as examples. (B) The percentage of subunits of neurotransmitter receptor protein complexes considered as universally expressed housekeeping proteins is shown for 10 different criteria of housekeeping proteins, as described in Figure 2. Criteria10 is used in panel A. (C) The proportion of these housekeeping subunits that have been experimentally verified as essential for learning and memory in mouse models or that are implicated in psychiatric disease in humans is shown for the same 10 criteria of housekeeping proteins. Protein complex subunits, binary protein–protein interactions, and genetic data are all from Pocklington ). The network in (A) was visualized using Biolayout Express (3D) (Freeman ).
Discussion
The evolution of tissue-specific biological processes
Taken together, our findings suggest the following model for the evolution of tissue-specific functions. Many (but not all) tissue-specific proteins are recent evolutionary innovations (Lehner ). In general, these tissue-specific proteins initially make few interactions, and these interactions are frequently with much more widely expressed and ‘housekeeping' components of the cell. Thus, many tissue-specific proteins probably function by directly recruiting or modifying the activities of core cellular components.There are, however, exceptions to this trend, with some tissue-specific proteins acting as ‘local' hubs in the interaction network of a particular tissue (our unpublished observation).
Frequent re-use of housekeeping proteins for tissue-specific biology
Universally expressed ‘housekeeping' proteins tend to make many interactions. Many of these interactions (∼50–60%, Supplementary Figure 3) are with other housekeeping proteins. However, the majority of universally expressed proteins also make interactions that can only occur in a subset of the tissues in which they are expressed. Therefore, there appears to be very frequent, and possibly universal, re-use of ‘housekeeping' proteins to perform tissue-specific biological processes. That is, most housekeeping proteins can be considered to be important for different (or at least modified) biological processes in different tissues.In summary, our results suggest that it might be better to consider the biology of any particular tissue in the terms of the particular interactions that can occur in that tissue, rather than simply in the terms of the unique proteins that are expressed there.
The importance of interaction network dynamics
In unicellular yeast, broadly expressed proteins can have precisely temporally regulated activities because of their interactions with proteins with restricted expression profiles (de Lichtenberg ). We show here that a similar process may be widely used in multicellular organisms to restrict and modify the activities of a protein to a subset of the tissues in which it is expressed.Together with earlier analyses in yeast (Han ; Luscombe ; de Lichtenberg ), this work highlights the importance of considering global interaction networks as having dynamic, not static, structures, and topologies. Additional work analyzing how the networks of molecular interactions change between cell types, states, and conditions should prove a fruitful approach for understanding living systems.
Materials and methods
Protein interaction data
We compiled human protein interactions from a total of 21 different databases, as listed in Table I. We required that each interaction be supported by at least one piece of direct experimental evidence demonstrating physical association between two human proteins, and removed all interactions that did not meet these criteria. All interactions were mapped to common Ensembl gene identifiers. The complete network (‘CRG-all'), consists of 80 922 interactions between 10 229 human proteins (approximately half the human proteome) and is available as Supplementary Table S1.
Table 1
Human protein interaction datasets used to construct or support the integrated human interaction network
aConserved co-expression, co-citation, or evolutionary conservation data are only used in the final network as additional supporting evidence. All interactions must have at least one piece of experimental binding evidence to be included in the final dataset, or physical binding evidence from at least two publications to be included in the multiple-support network.
Filtered interaction dataset
In total, 13 102 of the interactions in our network between 4750 proteins are supported by experimental evidence of physical binding reported in at least two different primary research publications. Given the multiple lines of evidence supporting these interactions, we use this subset of interactions (‘CRG-filtered') as high-confidence interactions to confirm that our conclusions are not affected by interaction data quality or sampling (see Supplementary Figures).
Expression data
To identify which protein interactions can occur in a particular cell or tissue type, we used global gene expression data. Although interactions can be regulated by localization, phosphorylation, etc, we aim to distinguish the proteins that can interact under some condition in a tissue from those that cannot, and mRNA expression is a reasonable indicator of this potential. We used expression data from the GNF Atlas project that measured expression across 79 different human cell or tissue types (Su ). The MAS5 normalized expression levels were averaged between experimental replicas, and in cases where more than one probe set was present for a gene, the more sensitive probe set was used. In this dataset, a gene is considered as present in a tissue, if its normalized expression level is >200 (Su ). However, our conclusions remain the same when this stringency is increased or decreased (see Supplementary information). At this threshold, >98% of the interaction partners in our global network for which expression information is available are co-expressed in least one human tissue.
Housekeeping proteins
We identified universally expressed housekeeping proteins using a total of 10 different criteria. First, we used the GNF Atlas data, and considered housekeeping proteins as those with an expression level above 200 in all 79 tissues, or in more than 70/79 tissues (i.e. allowing for some false-negatives). Second, we used the same two tissue criteria, but increased (250) or decreased (150) the stringency at which a gene is considered expressed. Third, we used four additional sets defined in an earlier publication—genes identified as expressed in 18/18 or at least 16/18 tissues using microarray data, and genes with the same tissue criteria but defined using expressed sequence tag (EST) data (Zhu ).
Neurotransmitter receptor complexes
Components of N-methyl-D-aspartate receptor and metabotropic receptor complexes were identified by extensive proteomic studies as described (Pocklington ). We used the 215 subunits of these complexes that could be mapped to human Ensembl gene identifiers, of which 77 have demonstrated roles in learning and memory through genetic studies in mice or are implicated in psychiatric disorders in humans (Pocklington ). We used the sets of housekeeping proteins described above to identify how many of these subunits represent universally expressed proteins.
Protein evolution
Proteins were classified as metazoan specific or pre-metazoan using the analysis of Freilich .
Conflict of interest
The authors declare that they have no conflict of interest.Supplementary Figures 1 - 3
Authors: P Uetz; L Giot; G Cagney; T A Mansfield; R S Judson; J R Knight; D Lockshon; V Narayan; M Srinivasan; P Pochart; A Qureshi-Emili; Y Li; B Godwin; D Conover; T Kalbfleisch; G Vijayadamodar; M Yang; M Johnston; S Fields; J M Rothberg Journal: Nature Date: 2000-02-10 Impact factor: 49.962
Authors: A J Walhout; R Sordella; X Lu; J L Hartley; G F Temple; M A Brasch; N Thierry-Mieg; M Vidal Journal: Science Date: 2000-01-07 Impact factor: 47.728
Authors: Tom C Freeman; Leon Goldovsky; Markus Brosch; Stijn van Dongen; Pierre Mazière; Russell J Grocock; Shiri Freilich; Janet Thornton; Anton J Enright Journal: PLoS Comput Biol Date: 2007-10 Impact factor: 4.475
Authors: Arun K Ramani; Zhihua Li; G Traver Hart; Mark W Carlson; Daniel R Boutz; Edward M Marcotte Journal: Mol Syst Biol Date: 2008-04-15 Impact factor: 11.429
Authors: Min Sung Joo; Won Dong Kim; Ki Young Lee; Ji Hyun Kim; Ja Hyun Koo; Sang Geon Kim Journal: Mol Cell Biol Date: 2016-06-29 Impact factor: 4.272
Authors: Antigoni Elefsinioti; Ömer Sinan Saraç; Anna Hegele; Conrad Plake; Nina C Hubner; Ina Poser; Mihail Sarov; Anthony Hyman; Matthias Mann; Michael Schroeder; Ulrich Stelzl; Andreas Beyer Journal: Mol Cell Proteomics Date: 2011-08-11 Impact factor: 5.911
Authors: David P Nusinow; Adam Kiezun; Daniel J O'Connell; Joel M Chick; Yingzi Yue; Richard L Maas; Steven P Gygi; Shamil R Sunyaev Journal: Bioinformatics Date: 2012-10-11 Impact factor: 6.937