Literature DB >> 25720740

Self-similarity of human protein interaction networks: a novel strategy of distinguishing proteins.

Emad Fadhal1, Junaid Gamieldien1, Eric C Mwambene2.   

Abstract

The successful determination of reliable protein interaction networks (PINs) in several species in the post-genomic era has hitherto facilitated the quest to understanding systems and structural properties of such networks. It is envisaged that a clearer understanding of their intrinsic topological properties would elucidate evolutionary and biological topography of organisms. This, in turn, may inform the understanding of diseases' aetiology. By analysing sub-networks that are induced in various layers identified by zones defined as distance from central proteins, we show that zones of human PINs display self-similarity patterns. What is observed at a global level is repeated at lower levels of inducement. Furthermore, it is observed that these levels of strength point to refinement and specialisations in these layers. This may point to the fact that various levels of representations in the self-similarity phenomenon offer a way of measuring and distinguishing the importance of proteins in the network. To consolidate our findings, we have also considered a gene co-expression network and a class of gene regulatory networks in the same framework. In all cases, the phenomenon is significantly evident. In particular, the truly unbiased regulatory networks show finer level of articulation of self-similarity.

Entities:  

Mesh:

Year:  2015        PMID: 25720740      PMCID: PMC4342563          DOI: 10.1038/srep07628

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Recently, self-repeating phenomena has been observed in remarkably many systems, both natural as well as man made. What piques man's interest in them is often their aesthetic value more than their organising principles. In particular, long-range power-law correlations depicting self-similarities have been discovered in a remarkably wide variety of systems1. There have been attempts to identify self-similarity phenomenon in biological complex systems23 through some kind of re-normalisation. For instance, in biology the observation of the self-similarity phenomenon has been observed in surface areas and vesicular distributions of tissues45. In respect of self-similarity of the general complex systems to which biological networks belong, the work of Song et al6 is seminal. They analysed a variety of real complex networks and found that these systems consist of self-repeating patterns. This result was achieved by the application of a re-normalisation procedure that coarse-grains the system into boxes containing nodes within a given neighbourhood size. They identified a power-law relation between the number of boxes needed to cover the network and the size of the box, defining a finite self-similar exponent. In the precise terminology of graph theory, they found out that quotients of complex networks defined by covering neighbourhoods of certain distances were also power-law. Others have used variations of the method with some notable improvements78. However, it is not surprising that coarse-grain self-similarity was weak in PINs. It has been shown that the majority of nodes (over 90% in all cases that have been considered) lie within 3 distances away from the centre9. It is therefore not surprising that any coarse-graining beyond 2 distances from the centre would completely destroy the intrinsic power-law behaviour of the system. Coarse-graining requires that the network has a reasonable diameter and nodes are reasonably spread around the centre. We have, on the other hand, looked at power-law properties of networks from a different perspective; incomparable to the seminal work of Song et al. As has been shown elsewhere, PINs display a certain recognisable structure9, which for brevity, we call the stingray structure with quills in this sequel. This structure has been both our point of departure and our focus. We contend that PINs are self-repeating from the stingray structural point of view. There has been an intense and deliberate effort to determine PINs of many organisms with notable successes. The determination of these networks is to help uncover the generic organising principles of functional cellular networks10111213141516171819. This progress is an important step in our understanding of the evolution and behaviour of such systems. It is envisaged that an understanding of the organizing principles at systems level of biological networks would elucidate many of the perplexing questions including that of finding therapeutic targets202122. Such effort is under way in many fronts. Whilst this has been the general aim, much of the recent effort has focused on finding functional dependencies amongst the so-called hubs and their topological importance and positions in the network2324. There has been serious undertaking to understand both structural and functional systems level of protein-protein interaction (PPI) networks through graph visualisation and drawing. The most important piece of information that is required in visualisation is spatial distribution of the network. Yet, such information is calculable if networks are treated as metric spaces. Recently it has been shown that, treated as metric spaces, PINs of various organisms are what we have coined, as alluded to, a stingray structure with quills. That is, proteins with high degree coagulate in the centre of the network whilst those in the periphery have low degree and in the fringes we have nodes of single degrees9. Further, in that sequel it was shown that the observed stingray structure has significant biological implications. Amongst others, it was observed that proteins involved in sensing pathways tend to be more expressed in central zones and those in the periphery specialise in routine metabolic pathways. Second, it was observed that some zones are uniquely-enriched and represent a far more pronounced specialisation. Third, it was shown that cancer pathways are significantly over represented in zone 225. In this article, we have analysed substructures that are defined by zones from the centre. In other words, we have statistically visualised the human PINs at both global as well as at subsystems level. What has been revealed is as startling as is aesthetic. These substructures display the same phenomenon that is played out on a global scale. The core of human PINs are imposing self-similarity structures. The systems structures and the ensuing organising principles of these human PINs are repeated at macro as well as at lower levels. In other words, if one would appreciate the beauty of the structure and considered it as a flower with many petals; these very petals would also have petals, which would have more petals of the same kind. Moreover, in most cases, central proteins of various levels from human PINs are from same families, playing the same biological role possibly at every level of consideration. This repetition in similarity of centres is observed in gene regulatory networks, albeit with a finer level of articulation26. When pathway and function enrichment analysis are applied to various layers of the induced subgraphs, our results show that there is reinforcement and refinement of these phenomena in various levels of consideration. Moreover, it is clear that there is increased strength in specialisation. Overall, therefore, this self-similarity phenomenon offer a natural way to understanding the biological systems mechanics of the human PINs. As molecular networks may be biased, we also tested our method and hypothesis on truly unbiased networks such as gene co-expression network and transcriptional regulatory networks. Both cases strongly support the case; and in the case of regulatory network, it is even more pronounced than in PINs. In other words, we propose that at the core of human PINs, proteins assemble in the same manner of coagulation as systems structures at all levels defined by distance throughout a given network. The key organizing features of the central zones of human PINs are repeated at the level of induced subgraphs defined by distances from the centre. Proteins interact in the same manner, varying only in scale, and refinement of functionality. This recurrence may point to another way of identifying important proteins that may have utility as target drugs.

Results

The general structure of the human PINs

We modeled the human functional protein interaction network (HFPIN)27, which consists of 9448 nodes and 181706 interactions and the highly curated and currently largest available human signaling network (HSN)2829, which consists of 6305 nodes and 62937 interactions. We also looked at the combination of both HFPIN and HSN and produced what we have called the combined human network (CHN), which consists of 10573 nodes and 210689 interactions. Also, a new human protein interaction set based on three-dimensional information with other functional tools has recently been predicted (NHPIS)30, which consists of 7863 nodes and 23779 interactions. It was equally subjugated to our method. We have also modelled truly unbiased datasets: gene co-expression31 and regulatory networks26. We used a formal method that finds the protein(s) that has the smallest maximal distance to other proteins in the network. The starting point is that in all the networks under consideration, the centres were identified, and nodes were grouped (in zones) according to the distances they are from these central proteins. With this classification, functional enrichment was performed and biological hypotheses were drawn925. Here, we follow the same approach in our consideration of subgraphs of the networks we consider. Before we present self-similarity we are alluding to, let us first summarize the pertinent features of the structure in all the biological networks that were considered. We will argue that the same pattern is evident in induced subgraphs of these networks, determined by distances from central nodes. The essence of the structure is in the following manner. First, the centres consist of single nodes, all heavily involved in signalling pathway9. As for the HFPIN, the centre is MAPK14 and that of the HSN the centre is MAPK1. The combined human network has MAPK3 as the centre. Second, nodes in the central positions have higher degrees than those in the periphery. Moreover, degrees distribution is power law. The third feature is that while the diameters are generally large, the majority of proteins are located in the central positions (zone 1 to zone 3). Fourthly, proteins in the periphery are of low degree. They display the quill structure (node with degree 1) in the fringes of the network. To aid in visualising these networks, we have called these imposing structures stingray structures with quills. The structures of the HFPIN, HSN, CHN and the NHPIS are summarized in Tables 1 to 4.
Table 1

Metrics of induced subgraphs of HFPIN

PINNodesEdgesDiameterCentreZones around centre
     123456789 
HFPIN318170613MAPK143744610346457810414211Nodes
     863252221121Ave degree
     311111121Min degree
     5314303931462221Max degree
     01736533075612101# quills
HFPIN137348025MAPK31562133      Nodes
     1116914      Ave degree in the original network
     34186      Ave degree in the induced network
     113      Min degree
     144809      Max degree
     140      # quills
HFPIN1115515874MAPK110351       Nodes
     11894       Ave degree in the original network
     2214       Ave degree in the induced network
     22       Min degree
     7538       Max degree
     00       # quills
HFPIN1111038664MAPK1164371      Nodes
     10614316      Ave degree in the original network
     15181      Ave degree in the induced network
     211      Min degree
     38561      Max degree
     021      # quills
HFPIN24318549510HRAS15816872170262152   Nodes
     7246251346   Ave degree in the original network
     533518311   Ave degree in the induced network
     111111   Min degree
     2404222242021   Max degree
     147493104112   # quills
HFPIN2115611787NRAS85635      Nodes
     856048      Ave degree in the original network
     1991      Ave degree in the induced network
     211      Min degree
     84343      Max degree
     023      # quills
HFPIN211855753KRAS822       Nodes
     8551       Ave degree in the original network
     123       Ave degree in the induced network
     12       Min degree
     524       Max degree
     90       # quills
Table 2

Metrics of induced subgraphs of HSN

PINNodesEdgesDiameterCentreZones around centre
     123456 
HSN63056293711MAPK143235351940202384Nodes
     67237233Ave degree
     111111Min degree
     451377891195Max degree
     6401764133202# quills
HSN141849875MAPK32721423   Nodes
     78505   Ave degree in the original network
     28131   Ave degree in the induced network
     111   Min degree
     141792   Max degree
     16132   # quills
HSN1125430206PIK3CA991459   Nodes
     1196114   Ave degree in the original network
     37142   Ave degree in the induced network
     211   Min degree
     115827   Max degree
     045   # quills
HSN1119913623PIK3R1953    Nodes
     113199    Ave degree in the original network
     2621    Ave degree in the induced network
     11    Min degree
     8731    Max degree
     11    # quills
HSN22961274799AKT119815581082965 Nodes
     51381043 Ave degree in the original network
     3226621 Ave degree in the induced network
     11111 Min degree
     2281876091 Max degree
     844241635 # quills
HSN211697287AKT227834632 Nodes
     8161472427 Ave degree in the original network
     178621 Ave degree in the induced network
     41112 Min degree
     38352632 Max degree
     010820 # quills
HSN21125844PDPK11572   Nodes
     1014842   Ave degree in the original network
     741   Ave degree in the induced network
     211   Min degree
     1361   Max degree
     012   # quills
Table 3

Metrics of induced subgraphs of CHN

PINNodesEdgesDiameterCentreZones around centre
     123456789 
CHN1057321068913MAPK354260113352367614111Nodes
     953449221111Ave degree
     111111111Min degree
     5904313941261111Max degree
     1339831212404111# quills
CHN153089785MAPK13701545      Nodes
     1096837      Ave degree in the original network
     38202      Ave degree in the induced network
     111      Min degree
     214874      Max degree
     772      # quills
CHN1136258114MAPK14166195       Nodes
     16068       Ave degree in the original network
     4321       Ave degree in the induced network
     51       Min degree
     13796       Max degree
     03       # quills
CHN11116623363MAPK893693      Nodes
     17214721      Ave degree in the original network
     30244      Ave degree in the induced network
     433      Min degree
     84895      Max degree
     000      # quills
CHN25503725028PRKACA270249025431737    Nodes
     71512153    Ave degree in the original network
     53381321    Ave degree in the induced network
     11111    Min degree
     307420135361    Max degree
     642626957    # quills
CHN2123526477CSNK1E805868144    Nodes
     8994792918    Ave degree in the original network
     5210644    Ave degree in the induced network
     21113    Min degree
     72322995    Max degree
     021240    # quills
CHN2117419544CSNK1D676       Nodes
     9381       Ave degree in the original network
     572       Ave degree in the induced network
     51       Min degree
     676       Max degree
     02       # quills
Table 4

Metrics of induced subgraphs of NHPIS

PINNodesEdgesDiameterCentreZones around centre
     1234567 
NHPIS78632377914SNW1532223136608921594914Nodes
     141132221Ave degree
     1111111Min degree
     4774584716961Max degree
     2537413153501093214# quills
NHPIS14419117CDC5L305102261   Nodes
     3521   Ave degree
     1111   Min degree
     575131   Max degree
     12120191   # quills
NHPIS111421649SRRM235173271  Nodes
     24111  Ave degree
     11111  Min degree
     1514421  Max degree
     1762441  # quills
NHPIS11112133TADA2A71     Nodes
     11     Ave degree
     11     Min degree
     21     Max degree
     51     # quills
NHPIS2171360529ESR1139681774767  Nodes
     1410311  Ave degree
     11111  Min degree
     721264041  Max degree
     452222517  # quills
NHPIS21901456SP110322581  Nodes
     64211  Ave degree
     21111  Min degree
     10101121  Max degree
     0101051  # quills

Central zones of human PINs as induced subgraphs repeat the structure that is observed by the whole network

The key feature of self-similarity is the self-repeating patterns at various levels of consideration. In our case, we reveal that all the networks we dealt with splits into smaller parts that resemble the whole from a structural point of view of graphs. We split the graphs into parts that are defined by the zones from the centre, i.e., we look at the graphs induced by nodes that are zone i from the centre, where i is 1, 2, and 3. We examine their structure as was done in the global graphs, following closely what was done in our recent work9. We show that the structures we observe have similar patterns. What is striking is that centres of these substructures have similar functions and belong to the same families. When we now examine the repeating substructures of the giant graphs, in all cases, the induced subgraphs of zones 1 and 2, there is a single node for the centres, which are from the same family of the centres of the human PINs. In the first zones, they are from MAPK family, both in the HFPIN and the HSN. In zone 2, the centres of the induced subgraphs of the HFPIN are from the RAS family; those of HSN are from the general kinase family. The next natural consideration was to look at zones formed from the zones in the first instant to describe the self-similarity phenomenon. We considered a subset of proteins that form a particular zone and their interactions amongst themselves as a separate induced subgraph. Again, the same phenomenon was observed with varying degree of connectivity and expressed level of manifestation of this organizing principle, depending of the distance of the zone from the centre. In all the induced subgraphs, we observed the same organizing principles. Nodes with high degree coagulate in central positions and those with low degree are in the periphery of the graphs. Of particular importance, the degree distribution of proteins in these induced subgraphs follow similar patterns (see supplementary figures S1 to S7). The centre of the whole graph is MAPK14 for the HFPIN and MAPK1 for the HSN. As for the HFPIN, at the centre of the induced subgraph of nodes in the first zone is MAPK3. When one considers the zone 1 nodes at MAPK3, the centre is MAPK1 of which its zone 1 subgraph has centre MAPK11 (table 1). In which case, we repeatedly look at induced subgraphs of induced subgraphs. While the level of expression may weaken as we consider the induced subgraphs of these subgraphs, the centres at zones 1 all belong to the MAPK family, a critical family of proteins in signalling. The same is observed for the HSN (table 2). It is not particularly surprising that, considering that the combined human network has more data, the features of the self-similarity is more pronounced (table 3). This repeatedness is also observed in zones 2 of the human PINs. Centres are from KRAS family for HFPIN and AKT for the HSN respectively (tables 1 and 2). Both of these families are heavily implicated in cancer pathways3233.

Biological ramifications of the self-similarity structure in the HFPIN and similar networks

It has recently been observed that there is some level of specialization by proteins in various zones of the HFPIN25. Also, while some pathways cut across zones, of importance is that sensing pathways are far more pronounced in central zones than in periphery. Zones in the periphery tend to be involved in gene expression and metabolic pathways more than those in the centre. In addition, it was also observed that zone 2 bear the significant burnt of pathways involved in cancers. It is therefore natural that we understand how this phenomenon is played out from the point of view of the self-repeating topology we have alluded to in this article in biological terms. What is made clear is that there seem to be some level of strengthening in terms of pathways. Four issues are worthy noting. First, the fact that some zones have uniquely-enriched pathways is a clear indication that in those zones, there is a strong representation of proteins that are associated with such pathways. Consider for instance the TRAF6 Mediated Induction of Proin-flammatory cytokines pathway, which is uniquely-enriched in zone 1 in the entirety of the network in the HFPIN. In zone 1 of the induced subgraph of zone 1, as a percentage of proteins involved in this pathway, there is an increase to 20.5% from 10.4%. In the second layer, (zone 1 of zone 1 of zone 1), the percentage incereases to 26.2%. In the next level, it increases to 28.1%. This points to the fact that as one moves into deeper levels, one sees that there is a coagulation of proteins that are highly specialised in specific pathways (table 5).
Table 5

Summary of increases in percentage of pathways as one moves into deeper levels of HFPIN1

Enriched pathwaysZone 1 of HFPINZone 1 of HFPIN1Zone 1 of HFPIN11Zone 1 HFPIN111
Signal transduction38.1%52%52%42.1%
Immune system31.3%48%55.3%54.6%
MAPK signalling pathway26.6%35.8%48.5%54.6%
Pathways in cancer22%26.2%31%18.7%
TRAF6 Mediated Induction of proinflammatory cytokines10.4%20.5%26.2%28.1%
Second, this phenomenon of strengthening is not restricted to uniquely-enriched pathways. Consider the top 4 pathways in zone 1: signal transduction (38.1%), immune system (31.3%), MAPK (26.6%), pathways in cancer (22%). In the third level of consideration (zone 1 of zone 1 of zone 1), the order changes: immune system (55.3%), signal transduction (52%), MAPK (48.5%), pathway in cancer (31%). By the time the next level is considered, the MAPK signalling pathway dominates, with 54.6% (table 5). Third, some pathways are more highly represented in the periphery of central zones. For instance, it is interesting to note that signal transduction has an ebbing effect as one moves deeper into central zones of central zones; it still leads in zone 2 of induced subgraph of zone 1. In zone 2 of zone 1 of the induced subgraph, the percentage of proteins involved in signal transduction is highest with 52.5% of proteins involved in this pathway (table 6).
Table 6

Summary of increases percentage of pathways as one moves into deeper levels of HFPIN2

Enriched pathwaysZone 2 of HFPINZone 1 of HFPIN2Zone 1 of HFPIN21Zone 1 of HFPIN211
Signal transduction51.2%52%53%52.5%
Immune system32.6%48.1%55.3%45%
MAPK signalling pathway14.1%35.9%48.5%54.7%
Pathways in cancer28.2%26.3%31.1%45%
Finally, while it was noted that zones in periphery have a tendency to diversify in metabolic functions, it is important to note that such pathways are ubiquitous. However, there are more enriched in periphery of zones of central zones. Consider for instance, gene expression, metabolism and membrane trafficking. In the induced subgraph of zone 1, the gene expression pathway is uniquely-enriched in zone 2, whilst in the induced subgraph of zone 2, it is significant in zones 2. In the induced subgraph of zone 3, it is the main theme of central zones. These observations are equally evident in the HSN, CHN and NHPIS (see supplementary tables S1 to S6). In summary, therefore, we see that the self-repeating structure is played out even from the biological point of view. Signalling pathways continue to be significant in central zones; routine metabolic pathways are significant in the periphery of the network, at all levels of consideration. However, the consideration of the self-repeating structure renders specialisation even more prominent: there are cases where pathways are highly distinguished or uniquely-enriched. Using the self-similarity structure, it is possible to group proteins in some order of importance, a theme we discuss below.

Cancer pathways' zonal distribution in self-similarity terms

In our recent work when we considered the distribution of proteins that consistently expressed in 13 types of cancer25, it was shown that most of these proteins are prominent in zone 2 of the HFPIN, HSN and CHN (tables 7 to 9). Here, the same methods were applied as we analysed each of the subgraphs from each zone. While on the whole network, cancer proteins are in zone 2, the critical compartment is zone 3 of zone 2 for the HFPIN (table 10) and zone 2 of zone 2 for both HSN and CHN referred to in Tables 11 and 12.
Table 7

Cancer pathways' zonal distribution in HFPIN

Type of cancer# of proteinsZone 1 (374)Zone 2 (4610)Zone 3 (3464)Zone 4 (578)Zone 5 (104)
Breast33011 (3.3%)189 (57.2%)121 (36.6%)9 (2.7%)-
Cervical71126 (3.6%)425 (59.7%)230 (32.3%)23 (3.2%)7 (0.9%)
Endometrial151557 (3.7%)839 (55.3%)514 (33.9%)83 (5.4%)20 (1.3%)
Fallopian129249 (3.7%)715 (55.3%)446 (34.5%)67 (5.1%)14 (1%)
Glioblastoma104638 (3.6%)589 (56.3%)368 (35.1%)44 (4.2%)6 (0.5%)
Glioma118040 (3.3%)621 (57.7%)440 (37.2%)63 (5.3%)13 (1.1%)
Kidney56114 (2.4%)331 (59%)193 (34.4%)23 (4%)-
Liver71529 (4%)402 (56.2%)247 (34.5%)33 (4.6%)4 (0.5%)
Lung53219 (3.5%)314 (59%)175 (32.8%)22 (4.1%)2 (0.3%)
Ovarian77526 (3.3%)432 (55.7%)279 (36%)32 (4.1%)6 (0.7%)
Pancreatic71730 (4.1%)411 (57.3%)244 (34%)28 (3.9%)4 (0.5%)
Pituitary112637 (3.2%)591 (52.4%)421 (37.3%)61 (5.4%)15 (1.3%)
Rectal159769 (4.3%)861 (53.9%)552 (34.5%)90 (5.6%)23 (1.4%)
Average 3.5%56.5%34.8%4.4%0.7%
Table 8

Cancer pathways' zonal distribution in HSN

Type of cancer# of proteinsZone 1 (432)Zone 2 (3535)Zone 3 (1940)Zone 4 (202)Zone 5 (38)
Breast23612 (5%)151 (63.9%)70 (29.6%)2 (0.8%)1 (0.4%)
Cervical53342 (7.8%)323 (60.6%)157 (29.4%)9 (1.6%)1 (0.1%)
Endometrial109289 (8.1%)647 (59.2%)336 (30.7%)17 (1.5%)2 (0.1%)
Fallopian94172 (7.6%)563 (59.8%)287 (30.4%)16 (1.7%)2 (0.2%)
Glioblastoma76764 (8.3%)471 (61.4%)216 (28.1%)13 (1.6%)2 (0.2%)
Glioma82435 (8%)278 (64%)114 (62.2%)5 (1.1%)1 (0.2%)
Kidney43414 (2.4%)331 (59%)193 (34.4%)23 (4%)-
Liver53745 (8.3%)328 (61%)155 (28.8%)7 (1.3%)1 (0.1%)
Lung42231 (7.3%)260 (61.6%)121 (28.6%)8 (1.9%)2 (0.4%)
Ovarian55739 (7%)334 (59.9%)174 (31.2%)8 (1.4%)1 (0.1%)
Pancreatic53646 (8.5%)332 (61.9%)148 (27.6%)8 (1.4%)1 (0.1%)
Pituitary78956 (7%)458 (58%)253 (32%)19 (2.4%)2 (0.2%)
Rectal116295 (8.1%)677 (58.2%)365 (31.4%)21 (1.8%)3 (0.2%)
Average 7.5%60.6%29.6%1.5%0.2%
Table 9

Cancer pathways' zonal distribution in CHN

Type of cancer# of proteinsZone 1 (542)Zone 2 (6011)Zone 3 (3352)Zone 4 (367)Zone 5 (61)
Breast35024 (6.8%)224 (64%)95 (27.1%)7 (2%)-
Cervical76043 (5.6%)496 (65.2%)203 (26.7%)16 (2.1%)2 (0.2%)
Endometrial164491 (5.5%)1007 (61.2%)474 (28.8%)61 (3.7%)11 (0.6%)
Fallopian140871 (5%)869 (61.7%)409 (29%)51 (3.6%)8 (0.5%)
Glioblastoma112863 (5.5%)719 (63.7%)311 (27.5%)30 (2.6%)5 (0.4%)
Glioma127067 (5.2%)765 (60.2%)380 (29.9%)48 (3.7%)10 (0.7%)
Kidney59344 (7.4%)389 (65.5%)150 (25.2%)10 (1.6%)-
Liver76951 (6.6%)475 (61.7%)221 (28.9%)21 (2.7%)1 (0.1%)
Lung57139 (6.8%)369 (64.6%)153 (26.7%)9 (1.5%)1 (0.1%)
Ovarian82337 (4.4%)524 (63.6%)236 (28.6%)23 (2.7%)3 (0.3%)
Pancreatic77144 (5.7%)483 (62.6%)223 (28.9%)21 (2.7%)-
Pituitary122860 (4.8%)738 (60%)373 (30.3%)47 (3.8%)10 (0.8%)
Rectal175396 (5.4%)1061 (60.5%)515 (29.3%)70 (3.9%)11 (0.6%)
Average 5.7%62.7%28.2%2.8%0.3%
Table 10

Cancer pathway distribution in induced zone 2 of HFPIN in self-similarity terms

Type of cancer# of proteinsZone 1 (158)Zone 2 (1687)Zone 3 (2170)Zone 4 (262)Zone 5 (15)
Breast1822 (1%)71 (39%)98 (53.8%)11 (6%)-
Cervix40713 (3.1%)147 (36.1%)224 (55%)20 (4.9%)3 (0.7%)
Endometrium79028 (3.5%)304 (38.4%)407 (51.5%)45 (5.8%)5 (0.6%)
Fallopian67317 (2.5%)241 (35.8%)374 (55.5%)37 (5.4%)4 (0.5%)
Glioblastoma56319 (3.3%)220 (39%)290 (51.5%)32 (5.6%)2 (0.3%)
Glioma58722 (3.7%)217 (36.9%)316 (53.8%)30 (5.1%)2 (0.3%)
Kidney31410 (3.1%)130 (41.4%)156 (49.6%)14 (4.4%)4 (1.2%)
Liver38113 (3.4%)141 (37%)207 (54.3%)17 (4.4%)3 (0.7%)
Lung2995 (1.6%)126 (42.1%)148 (49.4%)18 (6%)2 (0.6%)
Ovarian4119 (2.1%)151 (36.7%)230 (55.9%)19 (4.6%)2 (0.4%)
Pancreas3009 (3%)143 (47.6%)126 (42%)19 (6.3%)3 (1%)
Pituitary56919 (3.3%)220 (38.6%)301 (52.8%)27 (4.7%)2 (0.3%)
Rectal81126 (3.2%)305 (37.6%)431 (53.1%)46 (5.6%)3 (0.3%)
Average 2.8%38.9%52.1%5.2%0.5%
Table 11

Cancer pathway distribution in induced zone 2 of HSN in self-similarity terms

Type of cancer# of proteinsZone 1 (198)Zone 2 (1558)Zone 3 (1082)Zone 4 (96)Zone 5 (5)
Breast1367 (5.1%)89 (65.4%)33 (24.2%)7 (5.1%)-
Cervical28517 (5.9%)189 (66.3%)72 (25.2%)7 (2.4%)-
Endometrial56244 (7.8%)325 (57.8%)176 (31.3%)17 (3%)-
Fallopian48937 (7.5%)295 (60.3%)143 (29.2%)14 (2.8%)-
Glioblastoma40932 (7.8%)252 (61.6%)113 (27.6%)12 (2.9%)-
Glioma42233 (7.8%)259 (61.3%)120 (28.4%)10 (2.3%)-
Kidney24819 (7.6%)156 (62.9%)61 (24.5%)12 (4.8%)-
Liver28121 (7.4%)175 (62.2%)76 (27%)9 (3.2%)-
Lung22915 (5%)143 (62.4%)63 (27.5%)8 (3.4%)-
Ovarian29016 (5.1%)190 (65.5%)74 (25.5%)10 (4.3%)-
Pancreatic28519 (6.6%)189 (66.3%)71 (24.9%)6 (2.1%)-
Pituitary39334 (8.6%)242 (61.5%)107 (27.2%)10 (2.5%)-
Rectal58147 (8%)340 (58.5%)177 (30.4%)17 (2.9%)-
Average 6.3%62.4%27.1%3.2%-
Table 12

Cancer pathway distribution in induced zone 2 of CHN in self-similarity terms

Type of cancer# of proteinsZone 1 (270)Zone 2 (2490)Zone 3 (2543)Zone 4 (173)Zone 5 (7)
Breast2127 (3.1%)97 (45.7%)104 (49%)4 (1.8%)-
Cervical47018 (3.8%)257 (54.6%)184 (39.1%)11 (2.3%)-
Endometrial93935 (3.7%)516 (54.9%)362 (38.5%)26 (2.7%)-
Fallopian81033 (4%)448 (55.3%)306 (37.7%)23 (2.8%)-
Glioblastoma67630 (4.4%)361 (53.4%)267 (39.4%)18 (2.6%)-
Glioma72135 (2.8%)383 (53.1%)280 (38.8%)23 (3.1%)-
Kidney37115 (4%)172 (46.3%)173 (46.6%)11 (2.9%)-
Liver45117 (3.7%)237 (52.5%)184 (40.7%)13 (2.8%)-
Lung35115 (4.2%)169 (48.1%)158 (45%)9 (2.5%)-
Ovarian49519 (3.8%)276 (55.7%)183 (36.9%)17 (3.4%)-
Pancreatic45321 (4.6%)240 (52.9%)180 (39.7%)12 (2.6%)-
Pituitary69327 (3.8%)372 (53.6%)274 (39.5%)20 (2.8%)-
Rectal99143 (4.3%)526 (53%)391 (39.4%)30 (3%)1 (0.1%)
Average 4%52.2%40.7%2.7%0.007%

Distinguishing proteins using the self-similarity edifice

It is generally accepted that the degree of the node is a strong indicator of the importance and/or essentiality of the protein in the network2324. As one looks at various layers of zones, central zones of central zones tend to have higher degree in the entirety of the network than the other zones. For instance, proteins from zone 1 of zone 1 in HFPIN have an average degree of 118 and that of zone 1 of zone 2 is 85 (table 1). It has also been shown that, in general, both sensing pathways and proteins implicated in diseases tend to be pronounced in central positions25. While there is some disagreements about what is more important between sensing pathways and metabolic ones, we contend that sensing pathways are more important as they are likely to elicit a metabolic response to facilitate homeostasis. In view of the foregoing, we propose that proteins in zone 1 have a higher weighting than those in zone 2 and so on. So, for instance, nodes in zone 3 of zone 1 would have more weight than those in zone 1 of zone 2.

Self-similarity in other biological networks

Both gene co-expression and regulatory networks show stingray structures. When gene co-expression network is subjugated to sub-structure analysis, the majority of the induced subgraphs have single centres. However, as we delve further, we do not obtain single centres. Also, that centres are from the same family cannot fully be established (table 13).
Table 13

Metrics of induced subgraphs of Co-expression network

NetworkNodesEdgesDiameterCentreZones around centre
     123456789 
Co-exp717125426016TFRC5752061262178920545686Nodes
     36710527664569Ave degree
     221222222Min degree
     122865432446322081214Max degree
     001000000# quills
Co-exp1573925754GP1BB527414      Nodes
     3455010      Ave degree
     624      Min degree
     94822214      Max degree
     000      # quills
Co-exp11527895043SCNN1A46462       Nodes
     367126       Ave degree
     264       Min degree
     922354       Max degree
     00       # quills
Co-exp111464808693GNAS, HAB14548       Nodes
     348203       Ave degree
     2496       Min degree
     847390       Max degree
     00       # quills
Co-exp217907949410PRR11314634550137272   Nodes
     27179341832   Ave degree
     2822222   Min degree
     47037227098102   Max degree
     000000   # quills
Co-exp21314372685FEN116211734      Nodes
     29420482      Ave degree
     1203224      Min degree
     333292156      Max degree
     000      # quills
Co-exp21116223123332 GENES1291       Nodes
     2792       Ave degree
     1182       Min degree
     3192       Max degree
     00       # quills
However, the gene regulatory networks we looked at, despite that the networks have small orders, show a much more pronounced articulation of the phenomenon (see supplementary tables S7 to S10).

Methods

Evaluation of biological networks as metric spaces

We considered human PINs (HFPIN, HSN, NHPIS) and gene co-expression and regulatory networks as metric spaces by defining the usual graph theoretic distance between nodes of a graph. Using a python wrapper around the C++ BOOST graph library (http://www.boost.org/), we used the Dijkstra algorithm to compute the shortest distances between all pairs of nodes and then identifyied the node or all nodes whose greatest distance to other nodes is/are smallest. This is the network center(s). From here, nodes were classified according to their distances from the centre and divided into zones based on distance from the topological centre(s). From each distance class, we calculated their degree distributions and also considered their connectivity of the graphs induced for each zone.

Pathway and function enrichment analysis

In order to determine whether zones of the human PINs we considered have biological significance, we divided proteins into subsets based on their distance from the true topological centre. Protein sets representing each zone were then subjected to a pathway over-representation analysis in order to determine whether the zones were specialised for specific functions. The Comparative Toxigenomics Databases Gene Set Enricher web service (http://ctdbase.org/tools/enricher.go and Gene Ontology enrichment (http://geneontology.org/page/go-enrichment-analysis) was used to perform the enrichment analysis and a corrected P-value of 0.01 was chosen as a statistical significance cutoff. Lastly, when such enrichment was observed, we calculated the proportion of proteins involved in each enriched pathway as a way to assess whether any zone displayed functional specialization.

Cancer gene expression data sources

We considered gene expression absence/presence calls from the following cancers types: breast, lung, kidney, pancreas, liver, cervix, ovary, glioblastoma, pituitary, glioma, fallopian, endometrium and rectum, which was downloaded from Gene Expression Barcode database (http://barcode.luhs.org/index.php?page=genesexp). Genes expressed in at least 99% of samples of a cancer of interest based on the Human HGU133 platform were downloaded. Gene expression was used as a proxy for protein expression and was mapped onto the PINs of interest in order to identify the zones in which gene product is located in.

Testing the difference between proportions

We performed a z-test for the difference between two population proportions p1 and p2. We identified the null and alternative hypotheses and we specified the level of significance to be P < 0.01. After that we determined the critical value(s) from the statistic table. Finally we found the standardized test statistic as shown below.

Statistical significance of the proportional analysis of pathway representation of zones

To test differences between proportions among zones, we need a statistical comparison of observed differences. A two-sample z-test for the differences between proportions for the top statistically enriched REACTOME pathways among zones was conducted. We defined the null hypothesis H0 to be: classification proportions of zones in the periphery in human PINs have as high proportion significance as zones closest to the centre, i.e the accuracy of the sensing functions in zones closest to the centre and the accuracy of metabolic functions in zones in the periphery. If the P < 0.01, we rejected H0 and concluded that the proportions support our claim that zones closest to the centre have high proportion significance than the zones in the periphery. In the other words, we have enough evidence at the 1% level to conclude that zones closest to the centre have high proportion significance than the zones in the periphery.

Author Contributions

E.F. implemented the algorithms, performed the analyses and drafted the original manuscript. E.C.M. proposed the concept of analyzing PINs as a self-similarity structure and oversaw the topological and statistical analyses. J.G. designed and oversaw and assisted in the functional evaluation tests and the biological interpretation of the results. E.C.M. and J.G. supervised the study and edited the manuscript. All authors have read and approved the final manuscript.
  27 in total

1.  A synthetic oscillatory network of transcriptional regulators.

Authors:  M B Elowitz; S Leibler
Journal:  Nature       Date:  2000-01-20       Impact factor: 49.962

2.  Global properties of the metabolic map of Escherichia coli.

Authors:  C A Ouzounis; P D Karp
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

Review 3.  Towards a circuit engineering discipline.

Authors:  H H McAdams; A Arkin
Journal:  Curr Biol       Date:  2000-04-20       Impact factor: 10.834

Review 4.  Fractals in biology and medicine.

Authors:  S Havlin; S V Buldyrev; A L Goldberger; R N Mantegna; S M Ossadnik; C K Peng; M Simons; H E Stanley
Journal:  Chaos Solitons Fractals       Date:  1995       Impact factor: 5.944

5.  Lethality and centrality in protein networks.

Authors:  H Jeong; S P Mason; A L Barabási; Z N Oltvai
Journal:  Nature       Date:  2001-05-03       Impact factor: 49.962

Review 6.  Modeling transcriptional regulatory networks.

Authors:  Hamid Bolouri; Eric H Davidson
Journal:  Bioessays       Date:  2002-12       Impact factor: 4.345

7.  Self-similarity of complex networks.

Authors:  Chaoming Song; Shlomo Havlin; Hernán A Makse
Journal:  Nature       Date:  2005-01-27       Impact factor: 49.962

8.  Molecular cloning of the akt oncogene and its human homologues AKT1 and AKT2: amplification of AKT1 in a primary human gastric adenocarcinoma.

Authors:  S P Staal
Journal:  Proc Natl Acad Sci U S A       Date:  1987-07       Impact factor: 11.205

9.  A human functional protein interaction network and its application to cancer data analysis.

Authors:  Guanming Wu; Xin Feng; Lincoln Stein
Journal:  Genome Biol       Date:  2010-05-19       Impact factor: 13.583

10.  Why do hubs tend to be essential in protein networks?

Authors:  Xionglei He; Jianzhi Zhang
Journal:  PLoS Genet       Date:  2006-04-26       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.