Andrew E Teschendorff1, Christopher R S Banerji2, Simone Severini3, Reimer Kuehn4, Peter Sollich4. 1. 1] Statistical Cancer Genomics, Paul O'Gorman Building, UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, United Kingdom [2] CAS Key Lab for Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China. 2. 1] Statistical Cancer Genomics, Paul O'Gorman Building, UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, United Kingdom [2] Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London WC1E6BT United Kingdom [3] Department of Computer Science, University College London, Gower Street, London WC1E 6BT, United Kingdom. 3. Department of Computer Science, University College London, Gower Street, London WC1E 6BT, United Kingdom. 4. Department of Mathematics, King's College London, London WC2R 2LS, UK.
Abstract
One of the key characteristics of cancer cells is an increased phenotypic plasticity, driven by underlying genetic and epigenetic perturbations. However, at a systems-level it is unclear how these perturbations give rise to the observed increased plasticity. Elucidating such systems-level principles is key for an improved understanding of cancer. Recently, it has been shown that signaling entropy, an overall measure of signaling pathway promiscuity, and computable from integrating a sample's gene expression profile with a protein interaction network, correlates with phenotypic plasticity and is increased in cancer compared to normal tissue. Here we develop a computational framework for studying the effects of network perturbations on signaling entropy. We demonstrate that the increased signaling entropy of cancer is driven by two factors: (i) the scale-free (or near scale-free) topology of the interaction network, and (ii) a subtle positive correlation between differential gene expression and node connectivity. Indeed, we show that if protein interaction networks were random graphs, described by Poisson degree distributions, that cancer would generally not exhibit an increased signaling entropy. In summary, this work exposes a deep connection between cancer, signaling entropy and interaction network topology.
One of the key characteristics of cancer cells is an increased phenotypic plasticity, driven by underlying genetic and epigenetic perturbations. However, at a systems-level it is unclear how these perturbations give rise to the observed increased plasticity. Elucidating such systems-level principles is key for an improved understanding of cancer. Recently, it has been shown that signaling entropy, an overall measure of signaling pathway promiscuity, and computable from integrating a sample's gene expression profile with a protein interaction network, correlates with phenotypic plasticity and is increased in cancer compared to normal tissue. Here we develop a computational framework for studying the effects of network perturbations on signaling entropy. We demonstrate that the increased signaling entropy of cancer is driven by two factors: (i) the scale-free (or near scale-free) topology of the interaction network, and (ii) a subtle positive correlation between differential gene expression and node connectivity. Indeed, we show that if protein interaction networks were random graphs, described by Poisson degree distributions, that cancer would generally not exhibit an increased signaling entropy. In summary, this work exposes a deep connection between cancer, signaling entropy and interaction network topology.
One of the key features of cancer is an increased cellular plasticity, mediated by an
increased promiscuity in signaling patterns, and driven by underlying genetic and
epigenetic aberrations which cause a fundamental rewiring of the intracellular signaling
network12345678. Every aberration found in a cancer
cell can be thought of as a perturbation if the aberration affects the gene
functionally. Such perturbations can be classed as activating, if they result in an
increased functional activity of the gene (e.g. amplification and overexpression of
ERBB2 in breast cancer), or inactivating, if it compromises gene function
(e.g. silencing through promoter DNA methylation). Whilst the effect of certain specific
perturbations on gene function can be predicted, it is much less clear how individual
perturbations affect the cellular phenotype as a whole, since this depends on the
collective nature of the other aberrations that are present in the same cell. Predicting
the net effect of multiple perturbations in a signaling network is hard due to complex
effects such as pathway redundancy and epistasis36. Moreover, in the
context of cancer, although the effect of specific aberrations on cell function is
known, it is yet unclear how individual cancer perturbations may contribute to the
observed increased signaling promiscuity and phenotypic plasticity.One way to approach this challenge computationally, is to anchor the analysis on global
measures which capture salient features of the cellular phenotype, and which are
computable from, say, a sample's molecular profile (e.g. a
sample's gene expression profile). Here we are particularly interested in
measuring signaling promiscuity since evidence is mounting that this underlies a
sample's phenotypic plasticity8. In previous work we have
started to explore a measure which approximates intra-sample signaling promiscuity, and
which is known as network signaling entropy91011. Signalling entropy
is computed from integrating a sample's genome-wide gene expression profile
with a protein interaction network and, as shown by us, provides a surprisingly good
estimate of a sample's height in Waddingtons's differentiation
landscape, with human embryonic stem cells (hESCs) exhibiting the highest levels of
entropy10. Indeed, signalling entropy was able to discriminate
cellular samples according to their differentiation potential within distinct lineages,
including hematopoietic, mesenchymal and neural lineages, and with terminally
differentiated cells within these lineages exhibiting the lowest levels of entropy10. Importantly, signaling entropy was also found to be higher in cancer
compared to normal tissue, consistent with the view that cancer cells represent a more
undifferentiated stem-cell like state, characterised by an increase in phenotypic
plasticity810.Given that increased signaling entropy is such a robust and characteristic feature of
differentiation potency and cancer, and that it is also amenable to computation910, it is of great theoretical and biological interest to study the
changes in entropy caused by cellular network perturbations. In the context of cancer,
two well-known network perturbations are the overexpression and underexpression of
oncogenes and tumour suppressor genes, respectively, and although these perturbations
are known to result in the uncontrolled activation of cell-growth and cell-proliferation
pathways, it remains unclear how these perturbations affect signalling promiscuity. In
order to deepen our understanding, we here decided to study the effect of such
perturbations on signaling entropy, using both simulated and real data, and using a
variety of different network types in order to assess the impact of network topology.
Specifically, we consider Erdos-Renyi random (Poisson) graphs12, scale
free networks13, as well as real protein-protein interaction (PPI)
networks141516. In doing so, we discover that in Poisson
networks, perturbations (be they activating or inactivating) lead to reductions in the
global entropy, but that this is not true for scale-free and more realistic PPI
networks. In networks exhibiting a scale-free, or near scale-free topology, we show that
gene expression perturbations affecting hubs exhibit a striking bi-modality, leading to
increases or decreases in the global entropy rate depending on the directionality of the
expression change. We further expose a subtle yet significantly positive correlation
between differential gene expression in cancer and node-degree, which we show drives the
increased signaling promiscuity of cancer, but only if the underlying protein
interaction network has a scale-free (or near scale-free) topology. Thus, this work
makes a deep connection between a defining feature of the cancer phenotype, i.e. high
signaling entropy, its differential gene expression pattern and the (near) scale-free
topology of real PPI networks.Although there are many studies on network perturbations, it is worth clarifying that the
network perturbations and outcome of interest (i.e. the entropy rate) considered in this
work are very different from the perturbations and outcomes of interest considered in
previous studies1718192021. Specifically, we consider network
perturbations which only alter the local edge weights without altering the underlying
network topology91022. Moreover, our network perturbations can be
both activating as well as inactivating, representing the two different types of cancer
alterations affecting oncogenes and tumour suppressors, respectively. In contrast, much
of the previous literature has dealt with the effects of removing specific nodes in
unweighted networks1718, a type of inactivating perturbation which
alters the underlying network topology, focusing on tolerance and robustness as outcome
measures17181921. Thus, from a network theoretical perspective,
the important novel insights reported in this work are made possible by considering a
novel type of network perturbation in the context of weighted networks defined by a
stochastic matrix. We should also stress that our outcome of interest, signaling
entropy, is a systems-level measure that is constructed from the genome-wide expression
profile of a given sample, and therefore has little to do with the protein signaling
disorder measures considered by other studies and which do not use gene expression
data23.
Results
Increased signaling entropy in cancer is driven by overexpression of hub
genes
In earlier work we demonstrated that signaling entropy, a measure of the
signaling promiscuity in a cellular sample, is increased in cancer compared to
normal tissue, irrespective of tissue type91011. This
increased signaling entropy is consistent with the observed increased phenotypic
plasticity of cancer cells (see e.g. Ref. 8). Thus,
increased signaling entropy has emerged as a cancer systems hallmark911. Signaling entropy is estimated as the entropy rate24 of a sample-specific stochastic matrix which models the
signaling interactions in the sample (Methods). This stochastic matrix is
computed by integrating the gene expression profile of the sample with a
comprehensive PPI network, invoking the mass-action principle to define the
edge-weights in the network (Methods). The mass-action principle is based
on the assumption that two proteins, which have been reported to interact, are
more likely to interact in a given sample if both are highly expressed in that
sample.Here we wanted to shed light on why, theoretically, we observe increased
signaling entropy in cancer. We decided to use liver cancer as a model since
liver represents a relatively homogeneous tissue, and is thus less affected by
contaminating non-epithelial cells. We downloaded gene-normalised RNA-Seq data
for a matched subset of 50 normal liver and 50 liver cancer samples from The
Cancer Genome Atlas (TCGA). Confirming our earlier work using Affymetrix gene
expression data91011, liver cancer exhibited a significantly
higher signaling entropy rate compared to normal liver tissue (Fig. 1A). Randomisation of the RNA-Seq profiles over the nodes in
the network resulted in a significantly reduced difference in entropy rate
between normal and cancer tissue (Fig. 1A), indicating (as
pointed out by us previously11) that the entropy increase in
cancer is driven by a subtle interplay between specific gene expression changes
and where these happen on the network. Specifically, we posited that the
topological properties of the genes undergoing the largest changes in gene
expression would be key features dictating the change in signalling entropy.
Figure 1
Increased entropy in liver cancer is driven by increased entropy at
hubs:
(A) Boxplots comparing the entropy rate (SR) of 50 normal liver samples (N)
to 50 matched liver cancer specimens (C), derived from RNA-Seq data of the
TCGA consortium. P-value is from a one-tailed Wilcoxon rank sum test,
testing the hypothesis that entropy rate is higher in cancer. Also shown is
the SR between normal and liver cancer for a case where the gene expression
profiles were randomly permuted (perm) over the interaction network. Observe
how the difference in the SR between normal and cancer is reduced and even
takes an opposite directionality, demonstrating that the interplay between
gene expression changes and network topology is dictating the higher
signaling entropy in cancer. (B) Boxplots showing the change in the mean
local entropy rate (LSR)
(〈π〉
−
〈π〉)
between normal and cancer of each node (gene) as a function of node degree,
positive values indicating higher values in cancer. (C) Scatterplot of the
differential change in the mean local entropy rate against the differential
change in the mean invariant measure (INVP)
(〈π〉
−
〈π〉).
Each data point is one node (gene). (D) Boxplots showing the change in the
mean local entropy (LS) of each node (gene)
(〈LS〉
−
〈LS〉)
between normal and cancer, as a function of node degree.
Since each gene i contributes an amount
π to the entropy rate of a
given sample (Methods), we computed for each gene the difference in the means of
its local entropy rate, π, between
normal and cancer tissue. In order to help interpretation, we also computed for
each gene the difference in the means of the invariant measure
π between normal and cancer, as well as
the difference in the average local entropy LS (Methods). All
these changes were assessed in relation to the connectivity of the genes in the
network. We observed that the entropy rate increase in cancer is driven mainly
by hubs, i.e. the nodes of highest degree in the network (Fig.
1B). Changes to the local entropy rates were driven by concomitant
changes in the average invariant measure (Fig. 1C). Thus,
hubs exhibited preferential increases in their average invariant measure, whilst
also demonstrating positive increases in the average local entropy (Fig. 1D). Since the invariant measure value at a node
i represents the steady-state probability of finding a random walker
at this node, the observed preferential increase in the invariant measure at
hubs means that there is an increased signaling flux through these hub nodes in
cancer.To gain insight as to why there is an increased signaling flux through hubs in
cancer, we focused on the hub gene exhibiting the largest increase in the local
entropy rate. This was the gene BUB1 (Fig. 2A). A
scatterplot of the expression values of BUB1 and that of its neighbors
(813 neighbors) in a representative normal sample versus the corresponding
expression values in a representative cancer sample, demonstrates that most of
the expression differences involve increases in gene expression, implicating
both the hub itself as well as some of its neighbors (Fig.
2B). Thus, for the majority of neighbors of BUB1, the
increased expression of BUB1 will, according to the mass action
principle, drive increased signaling through this hub. Indeed, for each one of
BUB1's neighbors we ranked its neighbors according to the
largest increase in gene expression, revealing that the original hub (i.e.
BUB1) ranked among the top 2% centile for 99% of the hub neighbors
(SI Appendix, fig. S1). Interestingly, this effect was
not unique to BUB1 since high-degree hubs generally exhibited a
significant skew towards increased gene expression in cancer (Fig. 2C–D).
Figure 2
Preferential overexpression of hub genes in cancer:
(A) Boxplot showing the local entropy rate (LSR) against normal/cancer
status, for the hub gene (BUB1) exhibiting the largest increase in
the local entropy rate. P-value is from a Wilcoxon rank sum test. (B)
Scatterplot of gene expression values between a representative normal
(x-axis) and cancer (y-axis) sample for the gene showing the largest
increase in the local entropy rate (gene BUB1, marked in red) and
that of its neighbours in the PPI network (over 800 neighbours, shown in
black). (C) Boxplot of the average difference in gene expression between
normal and cancer (positive values indicate higher expression in cancer)
against node-degree class. Observe how the highest-degree hubs show
preferential increased expression in cancer, whereas the largest reductions
in expression target low-degree nodes. (D) Density plot of the average
difference in gene expression between normal and cancer for two classes of
genes: hubs (defined as nodes of degree >316) and nodes of degree
1 (k = 1). The number of each is indicated, and the P-value is from a
Kolmogorov-Smirnov test, testing for a difference in their statistical
distributions.
Confirming the biological significance of these results, we reached very similar
conclusions by repeating the above analysis in the independent Affymetrix gene
expression data set of normal liver and liver cancer tissue25 (SI
Appendix, fig. S2–S3). Thus, the increased entropy rate in liver
cancer is driven mainly by the increased expression of the highest degree hubs
in the PPI network.
Effect of cancer perturbations on signaling entropy
That the highest degree genes show preferential expression increases in cancer
(Fig. 2C–D, SI Appendix, fig. S3) suggests
an intricate link between network topology and differential expression.
Confirming this further, in both liver expression sets we also observed that the
genes exhibiting the largest, or most significant, decreases in expression
preferentially mapped to low-degree nodes (Fig.
2C–D, SI Appendix fig. S3–S4).This intricate correlation between differential expression and node degree
motivated us to pursue a deeper understanding of the complex interplay between
network topology, gene expression perturbations and entropy rate. Intuitively,
and from the perspective of a gene i that interacts with an oncogenic
hub, overexpression of the latter would lead to an increased outgoing signaling
flux of node i towards the hub, potentially leading to an increase in the
overall entropy rate (Fig. 3A). Interestingly,
underexpression of a low-degree node, which may connect to a hub either directly
or indirectly through an intermediate node i would also lead to an
increased signaling flux through the hub (Fig. 3A). Thus,
the two characteristic topological features of differential gene expression
changes in cancer could synergize causing increased signaling flux through key
hubs. To test whether this is indeed the case, we performed a perturbation
analysis for the top 100 genes ranked according to fold-change between normal
liver and liver cancer. The initial signaling distribution was defined by
invoking the mass action principle on the average expression profile over all 50
normal liver samples. Next, each of the top 100 ranked genes was individually
perturbed by changing its expression level according to the observed difference
between normal and cancer tissue. Confirming our hypothesis, underexpressed
genes (which generally did not target hubs) led to marginal increases in the
entropy rate, whilst overexpressed hubs caused significant entropy increases
(Fig. 3B). Interestingly however, overexpression led
to marginal entropy decreases whenever it did not target the highest degree
hubs, suggesting that such perturbations draw away signaling flux from the major
hubs (Fig. 3B).
Figure 3
Effect of cancer perturbations on signaling entropy:
(A) Examples of two expression perturbations typically found in cancer. Top
depicts the example of an oncogenic hub undergoing overexpression in cancer,
which has the effect of drawing in signaling flux from a neighbour i.
Example at the bottom depicts the underexpression of a low-degree
“tumour suppressor” node (e.g. a transcripton factor),
which from the perspective of node i causes, indirectly, an increased
signaling flux through the nearby hub. (B) Perturbation analysis of the top
100 genes ranked according to fold-change between normal and liver cancer.
Plots shows the entropy rate after perturbation (y-axis) against node-degree
(x-axis), with colors indicating over or underexpression. Black horizontal
line defines the entropy rate of the average expression profile of normal
liver (i.e. before the perturbation).
The effect of perturbations on signaling entropy is dependent on network
topology
To further investigate the effect of individual perturbations on signaling
entropy, as well as the role of the underlying network topology, we devised a
simulation framework on toy networks, perturbing each node in turn, and
recording the effect on the entropy rate (Fig. 4A,
Methods). To simplify the analysis we considered an initial uniform edge weight
configuration, defining an unbiased random walk on the graph. We note that this
initial configuration represents a state of relatively high signaling entropy,
but not of maximal entropy (see Methods). As activating perturbations
we consider local increases in gene expression, whereby all the weights of edges
converging on a perturbed node i are assigned a relatively large weight
(Fig. 4B). Thus, as seen from the perspective of a
neighboring node j, before perturbation, node j has maximal local
entropy, given by log k (where k is the
degree of node j), whilst after the perturbation, the node's
local entropy is close to 0 (Fig. 4B). We emphasize again
that although in the initial configuration all local entropies are maximal, that
the initial entropy rate over the whole network is not maximal (see Methods).
Thus, after the perturbation, the global entropy rate of the network could
increase or decrease.
Figure 4
Cancer perturbations may increase the entropy rate on networks with
scale-free topology but not on random Poisson graphs:
(A) A cartoon of the network perturbation analysis: each node i of the
network is perturbed in turn by changing its expression value. The case of
overexpression is here indicated in red. The increased expression draws in
signaling flux from neighbours (only one perturbed edge is shown). The
entropy rate of the network after perturbing node i,
SR, is computed and compared to the entropy rate
SR of the original unperturbed network. For n nodes in the
network we get a distribution of entropy rate changes (SR
− SR, i = 1,…,n). (B) Zoomed-in
version of a network perturbation, whereby a node i undergoes a
perturbation (here overexpression). From the perspective of a neighbouring
node j, the perturbation causes a low signaling entropy configuration
around node j. Key question is how does this perturbation affect the
global entropy rate. (C) Perturbation analysis result, in which each node
(gene) of the network was perturbed through overexpression (red) or
underexpression (green). Plotted is the global entropy rate (SR) after the
perturbation (y-axis) against the degree of the perturbed node (x-axis), for
3 different networks: Erdos-Renyi (ER) graph, scale-free (SF) network and
the full PPI network (PPI). Black dashed line denotes the entropy rate
before the perturbation. In each plot there as many data points as there are
nodes in the network, each value corresponding to the perturbation of only
one node. Number of nodes (nn), average degree (avK) and median degree
(medK) are given.
In order to understand the potential impact of network topology, we first
conducted the perturbation analysis above on Erdos-Renyi (ER) random graphs, for
which the degree distribution is Poisson. For such ER graphs, we observed that
activating perturbations (i.e. increases in gene expression), always led to a
reduction in the global entropy rate, irrespective of node degree (Fig. 4C). Repeating the analysis for inactivating
perturbations, i.e causing nodes to undergo underexpression, we observed that
almost all nodes led to a decrease in entropy. Thus, given that cancer is
characterised by an increase in signaling entropy, this suggests that the
emergence of an increased signaling promiscuity regime in cancer must be due
either to specific topological features not present in random graphs, or to
non-random combinations of perturbations.To investigate this further, we next performed the same perturbation analysis
above, but now on networks characterised by a scale-free (or near scale-free)
topology, a key feature of real biological networks26. The
scale-free networks were matched to the same size and average connectivity than
the previously considered Erdos-Renyi graphs. Remarkably, in scale-free networks
we observed that activating perturbations exhibited a bi-modal response, with
perturbations at lower-degree nodes resulting in a reduction of the global
entropy rate, whilst hubs exhibited increases (Fig. 4C).
In fact, we observed two distinct regimes with an opposite functional
relationship between entropy change and node-degree (Fig.
4C). In the low-degree regime, the entropy rate decreased as node
degree increases, whereas in the high-degree regime one observes entropy
increases (Fig. 4C). Interestingly, this bi-phasic
behaviour was not seen for inactivating perturbations where we observed a
monotonic decrease of entropy with node degree (Fig. 4C).
In stark contrast to Poisson networks, high-degree nodes in the scale-free
network exhibited a bi-modal response dependent on the directionality of the
perturbation (Fig. 4C): overexpressed hubs led to entropy
increases, while underexpressed hubs led to corresponding decreases.Next, we wanted to test whether this bi-phasic and bi-modal behaviour is also
seen in real PPI networks. We first checked that our PPI network exhibited an
approximate scale-free topology (SI Appendix, fig. S5). Its clustering
coefficient was also significantly higher than that of a degree-distribution
matched scale-free network (SI Appendix, fig. S5). Performing the perturbation
analysis on the PPI network, we observed once again two phases, which was
particularly striking for activating perturbations, with one phase exhibiting a
negative correlation between node degree and entropy, whilst the hub regime
exhibited a positive correlation (Fig. 4C). Very
interestingly, however, increases in entropy were only observed for the
highest-degree hubs, with lower-degree hubs exhibiting decreases which were
surprisingly also of a larger magnitude (Fig. 4C). Thus,
in networks with a scale-free or an approximate scale-free topology,
overexpression of the highest degree hubs leads to an increase in the entropy
rate. But increasing signaling flux through lower-degree nodes, even if of
relatively high degree, leads to an overall reduction in the diffusion rate.From the combined perturbation analysis, we can thus see that individual
perturbations on an Erdos-Renyi graph, be they activations or inactivations (but
both causing a local reduction in entropy), invariably lead to a reduction in
the global entropy rate. This is in stark contrast to networks with a scale-free
or approximate scale-free topology, where we observe that gene activations can
have opposite effects on entropy rate depending on the degree of the activating
nodes.
Entropy rate increase in cancer requires a scale-free interaction network
topology
The previous perturbation analysis strongly supports the view that a scale-free,
or near scale-free network topology, is important for the observed increased
entropy rate in cancer. To test this formally, we recomputed the entropy rate of
all 50 normal liver and 50 liver cancer samples, but now using an underlying
Erdos-Renyi (ER) interaction network matched to the same size and average
connectivity of the full PPI network. In order to faithfully preserve the
correlation between gene expression and node degree of the PPI network, nodes of
the ER network were ranked according to degree and gene expression values
assigned according to their corresponding rank/centile in the original PPI
network. Thus, this node mapping between the two networks preserves the observed
rank correlation between differential expression and node-degree, allowing us to
objectively assess the importance of the scale-free property. Recomputation of
the entropy rates of all 100 samples on the ER network revealed no significant
difference between normal and cancer, thus demonstrating that the observed
entropy rate increase in cancer requires the scale-free property of the
interaction network (Fig. 5A). Supporting this further, we
observed, in two other matched normal-cancer RNA-seq expression sets from the
TCGA, that the entropy was no longer higher in cancer when the PPI network was
replaced with an equivalent ER graph (Fig.
5B–C). In independent Affymetrix gene expression data, we
observed that the cancer-associated increase in the entropy rate was reduced
upon computing entropy on an equivalent ER network, in three out of four studies
(SI Appendix, fig. S6). Thus, in 6/7 data sets, there was a reduction in the
entropy rate difference between cancer and normal tissue (Binomial, P = 0.008),
supporting the view that a scale-free interaction topology is indeed necessary
for the higher entropy signaling dynamics of cancer.
Figure 5
Entropy rate increase in cancer requires the scale-free topology of the PPI
network:
(A) Boxplots of the entropy rate (SR) for the 50 normal liver and 50 liver
cancer samples as evaluated on the original full PPI network (left), as well
as on an equivalent Erdos-Renyi graph (middle). P-values are from a
Wilcoxon-rank sum test. Corresponding ROC curves and AUC values (right). (B)
As A) but for TCGA RNA-Seq data from 27 colon cancers and 27 matched
normals. (C) As A) but for TCGA RNA-Seq data from 52 prostate cancers and 52
matched normals.
Discussion
Signaling entropy, a measure of the overall uncertainty or promiscuity in signaling
patterns within a cellular sample, has been shown to be of biological significance
in a variety of different contexts91011. In cellular
differentiation it provides a proxy to the energy potential (i.e. height) of
Waddington's epigenetic landscape, allowing the differentiation potential
of a sample to be assessed purely from its genome-wide transcriptomic profile10. Similarly, signaling entropy also provided us with a useful
framework in which to identify specific systems-level features characterising
cancer, one of which being the increased signaling promiscuity of cancer compared to
its corresponding normal tissue91011. This is important because
an increased signaling promiscuity could underlie the increased phenotypic
plasticity of cancer, as observed e.g. by Pisco et al8.In this work we aimed to obtain a deeper theoretical understanding as to (i) why
signaling entropy is increased in cancer and (ii) why it is such a robust
discriminatory feature. We have here demonstrated that the increase in signaling
entropy is driven by two factors. First, a subtle positive correlation between
differential gene expression and the degree of the corresponding proteins in the PPI
network. This correlation amounts to hubs exhibiting preferential increases in gene
expression, whilst those genes exhibiting the most significant underexpression map
preferentially to low-degree nodes. Second, the observed increase of entropy in
cancer requires the scale-free (or near scale-free) topology characterising PPI
networks. Indeed, by considering a Poisson network with an identical rank
correlation coefficient between differential expression and node-degree, we no
longer consistently observed a significant increased entropy rate in cancer (Fig. 5). Given the demonstrated biological significance of the
entropy rate1011, this last result thus exposes a deep connection
between the cancer phenotype and the underlying scale-free property of real PPI
networks. It suggests that if the degree distribution of a PPI network were Poisson,
that the transcriptomic changes seen in cancer would not define a highly promiscuous
signaling regime. In other words, our data support the view that cancer
“hijacks” the scale-free property of real signaling networks
in order to facilitate increased signaling promiscuity and intra-tumour
heterogeneity.The novel insights described above also explain why the entropy rate provides such a
robust discriminatory feature of the cancer phenotype. The robustness stems from the
subtle correlation between differential expression and node-degree. Although gene
expression data is notoriously noisy, there is generally speaking good agreement
across independent studies when comparing the changes in differential gene
expression between two marked phenotypes such as normal and cancer tissue27. Secondly, although current PPI networks only represent mere
caricatures of the real interactions in a cell, the “hubness”
of a protein is likely to be a very robust feature. Indeed, that a given protein has
exceptionally many interactions, thus defining a hub in a network, is likely to be a
very robust feature, despite the fact that the specific interaction space of the hub
may contain many false negatives and false positives15. Thus, the
relative robustness of differential expression and hubness drives the robustness of
the observed correlation between differential expression and node degree, which in
turn explains why increased signaling entropy is such a consistent feature of the
cancer phenotype910. Given the robustness of signaling entropy as a
marker of differentiation potency10, it is therefore tempting to
speculate that a subtle correlation between differential expression and node degree
also exists in the context of normal cellular differentiation. Furthermore, it will
be interesting to explore if the scale-free or near scale-free topology of PPI
networks is also a key element underlying the nature of pluripotency, multipotency
and terminal differentiation.Although many previous studies have explored differential gene expression changes in
cancer and other diseases in relation to network topology28293031323334353637, most of these have either
focused on global topological properties, or on finding differential gene modules,
or on studying absolute changes in differential expression. Indeed, a number
of studies agree in reporting that absolute differential expression correlates
negatively with node degree, meaning that hubs exhibit, on the whole, much smaller
changes in expression between disease phenotypes2830.
Interestingly, however, relatively little attention has been paid to studying the
directionality of differential gene expression in cancer in relation to
node degree. Here we have shown that there exists a subtle yet significantly
positive correlation between differential expression and protein-degree. On its own,
the biological significance of this correlation is unclear. However, by interpreting
this correlation in the novel contextual framework of signalling entropy, we have
here shown how, in the context of real (near) scale-free networks, it could underpin
the increased phenotypic plasticity of cancer.In summary, increased expression of oncogenic hubs, as well as reduced expression of
network-peripheral tumour suppressor genes, in interaction networks characterised by
a (near) scale-free topology, drives the high signaling entropy of cancer and could
thus underpin cancer's phenotypic robustness and plasticity. Further
in-depth study of the complex interplay between local protein activity changes,
their interaction network topology and the effect on signaling entropy is
warranted.
Methods
The protein protein interaction (PPI) network
We used a PPI network similar to that used in our previous publication38. Briefly, the human interaction network derives from the Pathway
Commons Resource (www.pathwaycommons.org)15, which brings together
protein interactions from several distinct sources, including the Human Protein
Reference Database (HPRD)14, the National Cancer Institute Nature
Pathway Interaction Database (NCI-PID) (pid.nci.nih.gov), the Interactome
(Intact) http://www.ebi.ac.uk/intact/ and the Molecular Interaction
Database (MINT) http://mint.bio.uniroma2.it/mint/. Protein interactions in this
network include physical stable interactions such as those defining protein
complexes, as well as transient interactions such as post-translational
modifications and enzymatic reactions found in signal transduction pathways,
including 20 highly curated immune and cancer signaling pathways from NetPath
(www.netpath.org)39. The network focuses on non-redundant interactions, only
included nodes with an Entrez gene ID annotation and on the maximally connected
component thereof, resulting in a connected network of 8,434 nodes (unique
Entrez IDs) and 303,600 documented interactions.
Normal and cancer tissue gene expression data sets
We focused on liver cancer because the associated normal tissue constitutes a
relatively homogeneous mass of cells, and thus the entropy rate is less likely
to be influenced by changes in tissue-type composition. We downloaded the level
3 gene normalized RNA-Seq data from the TCGA (www.cancergenome.nih.gov) for a
matched subset of 50 normal liver and 50 liver cancer samples. As validation, we
considered an Affymetrix expression data set, consisting of 37 normal livers
(including normal liver, cirrhosis and dysplasia) + 38 liver cancers25. To test generalisability, we also downloaded level 3 RNA-Seq
gene normalised data from the TCGA for prostate cancer (52 cancers &
52 matched normals) and colon cancer (27 cancers & 27 matched
normals). The other normal/cancer Affymetrix expression sets used have been
described previously10.
Construction of the sample specific stochastic matrix and entropy
rate
The construction of the entropy rate follows the same method described in our
earlier work1011. Briefly, we use the mass action principle to
define a stochastic matrix, p, for each individual sample.
In detail, let E denote the normalised expression level of
gene i in a given sample. For a given neighbour j ∈
N(i) (where N(i) labels the neighbours of
i in the PPI), the mass-action principle means that the probability
of interaction with j is approximated by the product
E, i.e. p
∝ E. Normalising this to ensure that
, we get for the stochastic
matrix,Clearly, if j
∉ N(i), then p = 0. From this
stochastic matrix one can then construct a local signaling entropy (LS)
aswhich reflects the level of
uncertainty or redundancy in the local interaction probabilities. We note that
the above expression for the local entropy is not normalised so that the maximum
possible entropy depends on the degree (k) of the node. In
fact,Finally, the signaling entropy
rate, SR, is defined in terms of the stationary distribution (or
invariant measure) π of the stochastic matrix
(πp = π), as2440i.e. this global signaling entropy rate
is a weighted average of the local entropies LS. We note that
although LS is independent of the expression level of gene
i, that the gene's contribution to the entropy rate, i.e.
π, is not. This is because
π will depend on the gene
i's expression level. In this work we refer to the term
LSR ≡
π as the local entropy rate
of gene i, whereas LS is just the gene
i's local entropy.
The maximum entropy rate
Given a connected network, the maximum entropy rate, maxSR, over the
network does not depend on the gene expression data but only on the adjacency
matrix of the network. In fact, the maximum entropy rate is attained for a
stochastic matrix p given by41where v and are the dominant right eigenvector and eigenvalue
of the adjacency matrix A, respectively. Thus, it is important to note
that the configuration of maximal local entropy, i.e. the configuration where
for each node i, p =
A/k and LS =
log k, is not the configuration of maximal global
entropy.
Perturbation simulation analysis
In what follows we describe the perturbation analysis performed on
Erdös-Renyi and scale-free networks, as well as on the full real PPI
network described earlier. The calculation of the global signaling entropy rate
is simplified significantly by the fact that the stochastic matrix defined by
equation 1 has the detailed balance property, i.e. the
stationary distribution obeys not only πp =
π, but the more restrictive condition
π =
π. This detailed balance
condition can be shown to implywhere F
is a normalisation constant and .The initial configuration for the perturbative analysis is that of maximal local
entropy for each node in the network, which as explained previously, does not
represent the state of global maximum entropy. To construct this initial
configuration we set the expression level of each gene/node to be identical
x = x. Thus, in the initial configuration,
x = k, and from detailed
balance we obtain for the stationary distribution thatwhere V is the number of nodes in the network
and where is the average degree. As
far as the entropy is concerned, the local entropy of each node i is
simply log k, so the initial entropy rate is simplyNow let us consider perturbing a gene
in the network by altering its expression level by an amount . Without loss of generality we label the
perturbed node by the index “1”, so that after
perturbation, the expression levels in the network are described by . The new stationary distribution then
becomesFor the
local entropies, we getwhere for i ∈
N(1), and (j ≠ 1). Thus, the change in the
entropy rate, ΔSR = SR′ −
SR, is easily computable following any
perturbation.In the actual analysis, when performing activating perturbations, we set x
= 2 and , whilst, when modeling
inactivating perturbations, we set x = 16 and . These values are typical for logged Affymetrix or
Illumina data, with highly expressed genes normally exhibiting values larger
than 12, and lowly expressed genes showing values smaller than 4.
Author Contributions
A.E.T. conceived and performed the research and wrote the manuscript. P.S. and R.K.
helped with analytical computations. C.R.S.B. and S.S. contributed useful comments
to an earlier version of manuscript.
Authors: Maryam Pouryahya; Jung Hun Oh; Pedram Javanmard; James C Mathews; Zehor Belkhatir; Joseph O Deasy; Allen R Tannenbaum Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2022-06-03 Impact factor: 3.702