Helene R McMurray1, Aslihan Ambeskovic2, Laurel A Newman2, Jordan Aldersley2, Vijaya Balakrishnan2, Bradley Smith2, Harry A Stern3, Hartmut Land4, Matthew N McCall5. 1. Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA; Department of Pathology and Laboratory Medicine, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA. 2. Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA. 3. Center for Integrated Research Computing, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA. 4. Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA; Wilmot Cancer Institute, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA. Electronic address: land@urmc.rochester.edu. 5. Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA; Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA; Wilmot Cancer Institute, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA. Electronic address: mccallm@gmail.com.
Abstract
Malignant cell transformation and the underlying reprogramming of gene expression require the cooperation of multiple oncogenic mutations. This cooperation is reflected in the synergistic regulation of non-mutant downstream genes, so-called cooperation response genes (CRGs). CRGs affect diverse hallmark features of cancer cells and are not known to be functionally connected. However, they act as critical mediators of the cancer phenotype at an unexpectedly high frequency >50%, as indicated by genetic perturbations. Here, we demonstrate that CRGs function within a network of strong genetic interdependencies that are critical to the malignant state. Our network modeling methodology, TopNet, takes the approach of incorporating uncertainty in the underlying gene perturbation data and can identify non-linear gene interactions. In the dense space of gene connectivity, TopNet reveals a sparse topological gene network architecture, effectively pinpointing functionally relevant gene interactions. Thus, among diverse potential applications, TopNet has utility for identification of non-mutant targets for cancer intervention.
Malignant cell transformation and the underlying reprogramming of gene expression require the cooperation of multiple oncogenic mutations. This cooperation is reflected in the synergistic regulation of non-mutant downstream genes, so-called cooperation response genes (CRGs). CRGs affect diverse hallmark features of cancer cells and are not known to be functionally connected. However, they act as critical mediators of the cancer phenotype at an unexpectedly high frequency >50%, as indicated by genetic perturbations. Here, we demonstrate that CRGs function within a network of strong genetic interdependencies that are critical to the malignant state. Our network modeling methodology, TopNet, takes the approach of incorporating uncertainty in the underlying gene perturbation data and can identify non-linear gene interactions. In the dense space of gene connectivity, TopNet reveals a sparse topological gene network architecture, effectively pinpointing functionally relevant gene interactions. Thus, among diverse potential applications, TopNet has utility for identification of non-mutant targets for cancer intervention.
Oncogenic mutations are critical for both cancer initiation and maintenance,
which has driven efforts to inhibit targetable oncoproteins as an effective strategy
for cancer treatment. Targetable mutations, however, are found only in a small
fraction of cancers. Thus, broader strategies for cancer intervention need to be
developed, such as targeting non-mutated proteins or molecular circuitry essential
to cancer cells. The complexity of cell regulation and the profound cellular
reprogramming associated with malignant transformation, however, provide formidable
barriers to elucidating disease-critical genetic circuitry. We propose that linking
genetic perturbation experiments with statistical modeling can provide inroads to
the discovery of functionally relevant gene regulatory network (GRN) architecture
and thus valuable identification and validation of genetic interactions in cancer
gene networks, which ultimately may inform the targeting of next-generation cancer
interventions.GRN modeling seeks to infer dependencies between genes and thereby provide
insight into the regulatory relationships that exist within a cell. Early methods to
estimate GRNs focused primarily on data arising from targeted experimental
perturbations (Ideker et al., 2000; Pe’er et al., 2001); however, these
methods relied on simplifying assumptions to reduce the computational demands of
these methods. As the size and scope of publicly available transcriptomic data
rapidly increased due to the formation of large cancer consortia such as The Cancer
Genome Atlas (TCGA), GRN modeling methodology transitioned to leveraging
observational data, which rely on correlation structure in the data produced by
natural biological covariation to infer regulatory relationships (Basso et al., 2005; Carro
et al., 2010; Langfelder and Horvath,
2008; Margolin et al., 2006; Zhang and Horvath, 2005). While the application
of these methods has provided insights into the regulatory systems crucial to
malignancy, these modeling approaches are hindered by technical sources of
covariation, such as batch effects or variable sample composition, either masking or
mimicking true biological dependencies (Zhang et
al., 2021). In addition, modeling of GRNs remains inherently difficult
due to high-order non-linear interactions and unmeasured sources of biological
variability that influence the GRN.Methodological and computational advances have greatly aided our ability to
model GRNs, which in turn has advanced our understanding of the processes involved
in both normal cellular function and disease progression. While much effort has been
dedicated to estimate GRNs from observational data, experimental perturbations allow
direct measurement of the effect that a change in one gene has on other measured
genes, which greatly facilitates the estimation of GRNs (Markowetz and Spang, 2003; Werhli et al., 2006; Zak
et al., 2003). Recent advances in computational hardware and
parallelization provide an opportunity to revisit the potential of
perturbation-based GRN estimation without the need to impose biologically
implausible restrictions on these models.In this article, we propose a GRN modeling procedure, called TopNet, based on
underlying gene perturbation data that addresses the core challenges of GRN
estimation: preprocessing and analysis of noisy experimental data, quantification of
uncertainty in gene expression changes following perturbation, exploration of a vast
model space to find networks supported by the gene perturbation data, and
identification of robust, biologically relevant network features. Our approach
adopts the ternary network modeling formalism proposed by Almudevar et al. (2011). Specifically, we consider a
ternary network modeling procedure that samples networks for which the network
attractors agree with the experimental data. In this article, we propose
improvements in both the search algorithm and the manner in which agreement between
the attractors and observed data is quantified.Previously, we were able to identify non-mutant mediator genes critical to
the cancer phenotype downstream of oncogenic mutations (Ashton et al., 2012; McMurray et al., 2008). Our approach to find such mediators was based on
the concept that the profound and multifaceted genetic reprogramming associated with
malignant cell transformation requires the cooperation of multiple oncogenic
mutations (Hahn et al., 1999; Land et al., 1983), and that this transition is driven by
synergistic modulation of downstream mediators (Lloyd et al., 1997; Sewing et al.,
1997; Xia and Land, 2007). This
class of functionally diverse and apparently unconnected genes was called
cooperation response genes (CRGs). Functional analysis of CRGs revealed that
>50% of the CRGs tested play an essential role in the cancer phenotype (McMurray et al., 2008) and that specific CRGs
represented newly identified, cancer cell-specific vulnerabilities (Kinsey et al., 2014; Smith and Land, 2012; Smith et al.,
2016). Given the functional diversity of CRGs and the robust and
redundant nature of regulatory circuitry in mammalian cells, this high frequency of
impact on the cancer phenotype by individual CRG perturbations is contrary to
expectation, unless CRGs act in a concerted manner. Our work demonstrates that CRGs
act within a strong network of unexpected genetic interdependencies that appears
critical to the robustness of the malignant phenotype.
RESULTS
CRGs contribute to tumor formation capacity
Genetic perturbation of individual CRGs in cells transformed by
p53175H and RasV12 (mp53/Ras cells), aimed to
re-establish levels of gene expression found in non-transformed parental young
adult mouse colon (YAMC) cells, results in a reduction of tumorigenicity in the
majority of cases (McMurray et al.,
2008). The extensive analyses presented here recapitulate our earlier
results. We now show an entirely new and much larger set of 49 CRG perturbations
(Figures 1, S1, and S2). Of these, 25 perturbations
resulted in a significantly reduced tumor formation capacity of the target cells
when implanted into immunocompromised mice. Together with our previous work,
these data demonstrate that 43 of the 75 CRGs tested (57%, adjusted p <
0.1) are critical to the cancer phenotype, including 2 perturbations that
increase tumor size (Figure 1A).
Figure 1.
Impact of CRG perturbations on tumor formation
(A) Waterfall plot indicates percentage change in endpoint tumor volume
of allografts following perturbation of individual CRGs in mp53/Ras cells.
Perturbations significantly altering tumor size, as compared to matched controls
are shown in gold (p < 0.1, Wilcoxon signed-rank test with
Benjamini-Hochberg adjustment). Others are shown in gray. Arrows indicate genes
chosen for inclusion in gene regulatory network modeling.
(B) Pie charts indicate the proportions of functional gene annotations
according to the Gene Ontology database for all CRGs, tumor-inhibitory CRGs, and
classical oncogenes and tumor suppressors, respectively. Colors signify
biological processes, as indicated.
(C) Scheme summarizes the cellular localization and cell biological
functions of proteins encoded by CRGs, colored as in (B). CRGs chosen for
inclusion in gene regulatory network analysis are indicated by bold, underlined
text.
CRGs regulate diverse hallmarks of the malignant phenotype. The
distribution of associated cell processes among those genes with a demonstrated
role in tumor regulation is highly similar to the distribution across all CRGs
but distinct from known oncogenes and tumor suppressors (Figure 1B). The genes whose perturbation affects tumor
formation vary in their localization from nuclear to intracellular to
extracellular function and are associated with numerous regulatory pathways
(Figure 1C). Notably, many of the CRGs
that are implicated in the control of tumor formation capacity are associated
with processes such as metabolism and cell adhesion, regulators of which are not
detected among frequently mutated oncogenes and tumor suppressor genes. Thus, we
consider the CRGs as critical mediators of oncogene cooperation that transmit
and carry out the effects of those oncogenic mutations.
Individual CRG perturbations affect expression of other CRGs
CRGs function in many diverse pathways and mechanisms (Figures 1B and 1C), and interrelationships between most of these genes have not been
previously reported. In contrast, the high frequency by which CRG perturbations
affect the cancer phenotype and the manner in which the CRGs themselves were
identified, by virtue of their synergistic response to multiple oncogenic
mutations, is consistent with the idea that CRGs function within a regulatory
gene network. We thus investigated the extent of mutual control of gene
expression among 20 CRGs, which were selected to represent the spectrum of
functional classes and tumor inhibitory effects observed in the full CRG set
(indicated in Figures 1A and 1C). Using a set of mp53/Ras cell populations, each
harboring 1 of 20 individual CRG perturbations, we compared the mRNA expression
profiles of perturbed cells with corresponding controls to detect changes in the
expression of the selected CRGs. These perturbations varied in their effects on
tumor growth (Figure 1A) and were chosen
from genes up- or downregulated in transformed mp53/Ras cells as compared to the
parental YAMC cells (Figure 2A). Expression
profiles for all 20 CRGs were measured for multiple independent replicates of
each CRG perturbation, revealing widespread changes in gene expression in the
CRG cohort upon perturbation of individual CRGs (Figure 2B).
Figure 2.
CRG connectivity
(A) Bar graph showing the expression levels of each CRG selected for
network analysis, as compared to their baseline expression in non-transformed
parental YAMC cells. Genes upregulated in mp53/Ras versus YAMC are shown in red,
while downregulated CRGs are shown in blue. Measurements were made by TaqMan
Low-Density Array (TLDA) for multiple replicates of each gene perturbation.
(B) Probability of change in expression of each CRG in response to
individual CRG perturbations indicated, computed based on mRNA levels detected
on TLDA for multiple replicates of each gene perturbation. Red denotes a likely
increase in the expression of the downstream gene and blue denotes a likely
decrease in expression. Note that the diagonal indicates the direction of
perturbation, intended to restore levels of gene expression found in parental
YAMC cells for each perturbed gene.
(C) Connectivity graph showing the influence of given CRGs on the
expression of other CRGs. Arrows indicate parent-child connections with
|probability| > 0.5, colored as in (A) to indicate likely direction of
change in the child. Nodes are colored as in Figure 1 to indicate tumor inhibitory effects of perturbation of the
indicated CRGs (gold = inhibitory, gray = no significant change). Placement of
nodes was chosen for comparison to Figure
5.
The high degree of connectivity between the selected CRGs is shown in
Figures 2C and S3A. Key parameters that describe
the interconnectedness of genes in such a graph are the out-degree (i.e., the
number of children of a given gene) and the in-degree (i.e., the number of
parents of a given gene). Of the 20 perturbations, 8 produced changes in the
expression of R5 of the other CRGs, with 1 perturbation, restoration of HoxC13
expression, having the highest out-degree (i.e., affecting the expression of 9
other CRGs) (Figures
S3A and S3B). Only two perturbations had an out-degree of zero, indicating that
they did not produce a change in any other CRG in the network. Moreover, there
was substantial variability in the in-degree among these CRGs, ranging from two
CRGs not responding to any other perturbations, to the one CRG, Sfrp2, with the
highest in-degree, responding to the perturbation of 8 other CRGs (Figures S3A and S3C). These results
highlight the remarkably high connectedness of these 20 CRGs. Given the numerous
interactions between this subset of CRGs, we needed to move beyond simple
connectivity analyses to prioritize interactions for biological testing. Thus,
we used statistical modeling to understand the flow of information in this gene
regulatory network and estimate the associated network architecture.
TopNet reveals rules governing attractor states
To identify gene interactions relevant to the malignant phenotype, we
developed TopNet, a ternary network modeling approach based on perturbations
capable of up- or downregulation of gene expression that accounts for
uncertainty in the underlying gene perturbation data. This appeared suitable, as
CRGs comprise both up- and downregulated genes, including genes with very low
expression levels (McMurray et al.,
2008).The estimation of a GRN typically focuses on inferring individual
interactions (edges) between genes and/or proteins (nodes). The collective
behavior of these interactions is then studied as an emergent property of the
low-level network architecture. This is a natural strategy when inferring GRNs
based on the measurement of pairwise interactions—for example,
transcription factor binding (either via ChIP-seq or promoter sequence motifs)
or protein-protein interaction data. However, gene perturbation experiments
remain the most reliable approach to investigate GRNs and to test cellular
responses to interventions (Markowetz and Spang,
2003; Werhli et al., 2006;
Zak et al., 2003). The gene
expression changes in response to perturbation provide information not on the
direct interactions between genes but rather on the altered steady-state
expression produced by the perturbation. In other words, the estimation
procedure can be framed as a search through the vast space of network models for
those that produce the observed data (Almudevar
et al., 2011).To assess potential interdependence between the 20 CRGs selected for
network estimation, we applied the TopNet algorithm, which uses a ternary
network model that accounts for the dynamic, cyclic, and non-linear nature of
GRNs and facilitates the evaluation of uncertainty (Almudevar et al., 2011). Specifically, TopNet assumes
that changes in gene expression can be modeled by one of three
states–downregulation (−), baseline expression (0), or
upregulation (+1)–and that the network is governed by deterministic
transition functions that encode the complex regulatory relationships between
genes (examples are shown in Figures
3A–3D). Discretization
captures the majority of information and inherently provides robustness (McCall et al., 2011; Parmigiani et al., 2002; Scharpf et al., 2003; Zilliox and Irizarry, 2007). Additional details of the TopNet
modeling algorithm are described in Method details, and Methods S2 and S3.
Figure 3.
Transition functions encode network behavior
Paths from perturbation of parent node HoxC13 to its possible children
are diagrammed. Four network fits of 100 computed are represented. Marginal
effects (single parent) and joint effects (multiple parents) are shown with
transition function tables. For simplicity of visualization, the maximum
in-degree shown here is 2, but the models applied allow for up to 4 parents to
influence a given child node’s expression. Red nodes indicate that the
gene increases its expression, while blue denotes reduced expression.
Following any given perturbation, the network will eventually reach a
steady state, called an attractor. This attractor may be a new state to which
the cell is driven by the action of the gene perturbation, or it may represent a
return to the baseline state–in this case that of a transformed mp53/Ras
cell. Recall that the experimental data represent a measurement of the
steady-state expression of the cells; therefore, the ability of a network model
to represent experimental data can be assessed by comparing these gene
expression measurements with the attractors produced by the network model (Almudevar et al., 2011). Specifically,
starting from a null model, TopNet randomly proposes small changes in the
network structure and scores the proposed new network based on the similarity
between its attractors and the observed steady-state data (Figure 4; Method details, and Methods S1, S2, and S3). This iterative search
algorithm can proceed quite rapidly because TopNet can easily compute the
attractor for each perturbation and assess the similarity of the resulting
attractors with the corresponding observed steady-state data. Ultimately, TopNet
identifies a network model whose attractors match the observed data well.
Figure 4.
Gene regulatory network reconstruction process
The network modeling algorithm begins with a null network in which no
gene affects any other gene (top left). This produces a set of null attractors
wherein the perturbation of any gene does not produce a change in any other
gene. The network models are scored by summing the absolute value of the
differences between the probabilities of up- or downregulation and the
attractors, which are then standardized by subtracting the best theoretically
possible score. Comparing the null attractors to the calculated probabilities
results in substantial differences, which produce a score of 40.36, indicating
that the network model does not explain the observed data well. Random changes
in topology and/or transition functions are made to the network model and the
process is repeated. Here, we show snapshots of the process after the first
10,000 steps, the next 10,000 steps, then after an additional 1 million steps,
and finally after 1 billion more steps, at which point, the network model
produces attractors that match the probabilities calculated from the data very
well.
Uncertainty quantification allows the identification of high-confidence
features of the network topology
There are often many network models that fit the observed data equally
well, and the model space is huge; for the network model presented here, there
are ~2.48 × 10763 potential networks. To more efficiently
explore this vast model space and avoid becoming trapped in local optima, we
implemented a replica exchange Monte Carlo algorithm (Swendsen and Wang, 1986) that allowed us to
parallelize the search over the nodes of a high-performance computing cluster.
Unlike previous approaches, here, we have modeled and addressed non-random
missing data (McCall et al., 2014), batch
effects, and uncertainty in estimates of differential expression (Methods S1 and S2).Due to the vast number of potential networks, it is exceedingly unlikely
that there exists a single ‘‘best’’ network;
therefore, as previously proposed by Almudevar et
al. (2011), TopNet repeats the estimation procedure many times to
produce a sample of networks from the space of models that explain the observed
data well. This collection of network fits can then be used to quantify the
degree of support for a given aspect of the network by calculating the
proportion of networks in which a given feature or features are present. For
example, based on the 4 network models shown in Figure 3, we could state that HoxC13 is likely a direct parent of
Sms because that relationship is present in all 4 models, whereas we have less
certainty about the relationship between HoxC13 and Id2 because it is only a
direct parent in 3 of the 4 models, remembering that the 4 models shown are only
a subsample of all that were computed.Building upon previous work (Almudevar et
al., 2011), TopNet implements several advances in modeling and
computation (see Method
details). These methodological improvements, including quantification
of uncertainty, produced more reliable estimates of the network topology than
previous methods (Figure
S4 and Methods
S2). In addition, we show that the selected in-degree threshold of 4
is flexible enough to produce network models that explain the observed data well
without being so flexible that even random data can be modeled well (Figure S5 and Methods S2). Finally, we
show general agreement in network topology for sufficiently complex models
(Figure S6 and
Methods S2). These
improvements to data analysis and network modeling were crucial to identify
high-confidence features of the network topology.
Modeling reveals robust topological features of the CRG regulatory
network
The result of the TopNet network modeling procedure is a sample of
network models, all of which explain the observed data well. Based on this
sample of network models, we are able to form inferential statements about
specific network features by averaging across the network models. For example,
in the case of topological relationships, we calculate the proportion of
networks in which a given gene is a parent of another gene. For the CRG network,
we report results based on 100 independent network fits. In Figure 5, we show edges present in at least 80% of the
network models, which represent regulatory relationships that are strongly
supported by the observed data.
Figure 5.
Summary of CRG network topology
The network models produced by 100 independent fits are mapped by
reporting the proportion of networks in which an edge (parent-child
relationship) appears in >80% of the network models. The node color
indicates the effect of perturbation of that node on tumor formation capacity
(as shown in Figure 1 and used in Figure 2, gold = tumor inhibitory
perturbation, gray = no significant change). Experimentally tested epistatic
interactions are indicated by green arrows.
The network model provides both a sparse topology of gene interactions
(Figure 5) and direction of gene
responses, derived from the underlying transition functions (see examples in
Figure 3). Based on this information,
we are able to prioritize genetic perturbation experiments testing particular
gene interactions and their role in maintaining or generating a given biological
state (i.e., the malignant state). Here, we have focused on investigating the
impact of multiple pairwise gene interactions relating to two central players
common to all of the network models, HoxC13, the parent node with the most
children, and Sfrp2, the gene affected by the largest proportion of
perturbations (Figures 2C and S3).
Interdependency of functionally diverse mediator genes
To test whether the CRG parent-child relationships suggested by the
network model reflect functional interdependencies between parent-child pairs,
we chose an example in which perturbation of the upstream gene has a
reproducible biological effect–in this case, inhibition of tumor growth
in vivo. Notably, genetic perturbation (i.e., re-expression
of HoxC13, a homeobox transcription factor) (Godwin and Capecchi, 1998; McMurray
et al., 2008), produced some of the strongest antitumor effects
observed in our experiments. Next, we carried out double perturbations of
parent-child pairs, aimed to limit the impact of perturbing parent gene
expression. For double perturbations, we selected the parent-child relationships
between HoxC13 and its high-confidence children: Sms, the gene encoding spermine
synthase (Pegg and Michael, 2010); Id2, a
gene encoding a member of the helix-loop-helix only and inhibitor of DNA-binding
protein family (Lasorella et al., 2001);
and Sfrp2, encoding a member of the family of secreted Frizzled-related
proteins, antagonists of Wnt signaling (Rattner
et al., 1997) (Figure 6A). Note
that the relationship between HoxC13 and Id4, while specified by the network
topology, is not observed in the connectivity graph, and thus an appropriate
genetic perturbation to test this relationship remains unclear.
Figure 6.
Epistatic CRG interactions affecting cancer phenotype
(A) Diagram of a subnetwork of the GRN shown in Figure 5, whose edges are indicated by solid lines,
with the addition of 2 edges seen only in the connectivity graph but not in the
topological map (dashed lines). Red arrows indicate that perturbation of the
parent gene increases the expression of the child, while blue arrows denote
reduced expression of the child in response to the parent perturbation. Nodes
are colored as in Figures 2 and 5.
(B–G) Boxplots show tumor growth in response to indicated single
and double perturbations targeting parent node HoxC13 and its predicted
children, or child node Sfrp2 and its parents. KD denotes knockdown
perturbations; all other perturbations induce overexpression. Perturbations
significantly decreasing tumor size, as compared to matched controls are
indicated by an asterisk (*p < 0.05, unadjusted Wilcoxon signed-rank
test). Double gene perturbations showing significantly different tumor size as
compared to individual gene perturbations are indicated by a hash mark (#, p
< 0.1, unadjusted Wilcoxon signed-rank test versus each individual
perturbation).
Our experiments testing relationships between HoxC13 and Sms, Id2, or
Sfrp2 demonstrate that changes to the expression of these child nodes are
required for the biological effects of the parent perturbation (Figures 6B–6D and S7).
All of these relationships represent previously unknown genetic
interdependencies, or functional epistatic relationships, critical to the cancer
phenotype. Specifically, we find that the ability of HoxC13 to suppress the
malignant state is curtailed when either low Sfrp2 levels or high Sms or Id2
levels are enforced. While double perturbation of HoxC13 with either Sms or Id2
appears to completely rescue tumor formation, combining HoxC13 re-expression
with Sfrp2 knockdown shows only a partial rescue. This may relate to additional
complexity in the relationship between HoxC13 and Sfrp2, with other molecules
playing a role in the phenotypic output, or it may relate to technical
limitations of the experimental protocol. Our results, however, are sufficiently
robust that they indicate biologically relevant gene interactions.The GRN presented suggests specific interactions that may help to
maintain the robustness of the network and ultimately the malignant state of the
cells. However, parent-child relationships beyond those identified by modeling
the GRN can be seen by direct examination of the impact of a perturbation on the
other genes in consideration (Figures 2 and
6A). Specific to HoxC13 as a parent,
gene expression profiling reveals that the expression of Plac8, encoding a
lysosomal protein critical for autophagosome-lysosome fusion in cancer cells
(Galaviz-Hernandez et al., 2003; Kinsey et al., 2014), and Rgs2, which
encodes a family member of regulators of G protein-coupled receptors (Kannangai et al., 2007), are each decreased
by HoxC13 perturbation. However, these interactions are not necessary for the
GRN to explain the steady-state observations of expression of all of the genes
in the network and thus are not frequently observed in the topological mapping
of the GRN. To test the relative importance of interactions identified by
network modeling versus those observed by gene expression profiling, we
generated double perturbations re-expressing HoxC13 together with either Plac8
or Rgs2. Remarkably, neither of these combinations revealed the dependence of
the antitumor effects of HoxC13 on Plac8 or Rgs2 (Figures 6E and S7), although enforcing both HoxC13 and Plac8 expression had an
enhanced antitumor effect, an interesting observation worthy of study in a
different context. Testing of the biological dependence of HoxC13 on potential
interactions suggested by the network model and those observed exclusively in
the connectivity graph shows that the GRN prioritizes biologically relevant
genetic interdependencies (Figure 5) among
the densely populated space of relationships found in the connectivity graph
(Figure 2B).Furthermore, we examined whether relationships between Sfrp2 and its
parents, HoxC13, Notch3, and Wnt9a (Figures
5 and 6A) are critical to the
malignant phenotype. As described above, the re-expression of HoxC13 does not
significantly inhibit tumor growth when low Sfrp2 levels are maintained.
Similarly, the tumor inhibitory effect of Notch3 expression is suppressed when
Sfrp2 expression is maintained at low levels (Figures 6F and S7). The re-expression of Wnt9a has a small but statistically
significant effect on tumor size (Figures
1A and 6G). In contrast to
HoxC13 and Notch3 re-expression, increased Wnt9a expression decreases Sfrp2 mRNA
levels (Figure 2A). When Sfrp2 and Wnt9a
expression were both increased, we observed a modest but significant reduction
in tumor growth compared to controls (p = 0.029, unadjusted Wilcoxon signed-rank
test) but increased tumor growth compared to Sfrp2 expression alone (Figures 6G and S7). This observation is
consistent with the idea that Sfrp2 and Wnt antagonize each other both at the
levels of gene expression and Wnt signaling activity (Rattner et al., 1997).Here, we demonstrate that CRGs function within a gene regulatory network
that contributes to the malignant phenotype. Our results reveal that numerous
tumor regulatory interactions are detected via attractor-based ternary GRN
modeling through the TopNet algorithm. Notably, high-confidence topological
features of the network model are frequently found to be biologically relevant,
and thus the network model effectively pinpoints previously unknown gene
interdependencies critical to the cancer phenotype.
DISCUSSION
Here, we show that apparently unrelated mediators of the cancer phenotype
are linked by a strong network of genetic interactions and thus maintain robustness
of the malignant state. Critical to this discovery, we developed and applied TopNet,
an approach to GRN estimation capable of identifying a biologically relevant
topological model of highly complex genetic interdependencies. In contrast to the
connectivity graph, TopNet can accurately model cellular responses to perturbations,
produce testable hypotheses regarding non-linear multi-gene interactions, and
identify control points in the network architecture. This involves searching through
a massive number of potential network models to identify those instances consistent
with the observed data. By combining the information from many network models,
TopNet prioritizes aspects of the GRN that are most strongly supported by the
experimental data.TopNet is capable of pinpointing key architectural features of cancer cells.
The application of TopNet to genetic perturbation experiments involving 20 CRGs has
identified high-confidence multi-gene interactions involved in the maintenance of
the cancer phenotype. Furthermore, experimentation confirmed all of the tested
interactions identified by TopNet as previously unknown genetic interdependencies.
In fact, all of the relationships between HoxC13 and its children Sms, Sfrp2, or
Id2, reported here, were experimentally confirmed to be contributing to the
tumor-inhibitory activity of HoxC13, which is downregulated in transformed cells as
compared to non-transformed parental controls (McMurray et al., 2008). Our data suggest that HoxC13 inhibits the
pathways downstream of Sms and Sfrp2 (i.e., polyamine metabolism and Wnt signaling)
and Id2 function, all of which is consistent with our observation of reduced tumor
growth capacity of transformed cells in which HoxC13 expression has been elevated to
resemble levels observed in non-transformed parental cells. Similarly, we
demonstrated functional relevance for all of the interactions between Sfrp2 and its
high-confidence parents Notch3 and Wnt9A, indicating that the tumor-inhibitory
activity of Notch3 is dependent on Sfrp2 and that Wnt9A can downregulate Sfrp2
expression and thus neutralize Sfrp2-mediated tumor inhibition, consistent with the
activation of Wnt signaling. We expect that further experimentation, especially
combined perturbations of >2 genes, would likely reveal additional
complexity. The feasibility of such studies, however, is limited by current
technologies.In addition to identifying high-confidence dependencies via examination of
the network topology, TopNet provides the corresponding transition functions that
determine the nature of dependencies between parent and child nodes (e.g., Figure 3). However, one should not assume that
either the summarized network topology or the transition functions necessarily
represent direct mechanistic relationships in the cell. In fact, the functional
diversity of the 20 CRGs examined in this work suggests that most of the identified
interactions and dependencies act through unmeasured intermediaries. Nevertheless,
armed with information on critical interactions in the cell, one can then dedicate
effort to understanding such interactions at a mechanistic level.A major strength of our work is the use of data from gene perturbation
experiments as the input for TopNet modeling. While a number of network modeling
approaches have been proposed to identify and understand epistatic interactions in
cancer cells, many of these modeling approaches are hindered by being observational
in nature, as they rely on correlation structure in the data produced by natural
biological covariation (van de Haar et al.,
2019). In contrast, network modeling based on perturbation experiments
allows prioritization of cause-effect relationships before experimental validation.
This has led to increased interest in perturbation experiments as a means of
modeling biological systems, such as the use of Perturb-seq (Adamson et al., 2016; Dixit et al., 2016; Jaitin et al.,
2016; Schraivogel et al., 2020) to
measure the effect of perturbations at single-cell resolution.While high-throughput perturbation approaches such as Perturb-seq are
currently limited by their reliance on gene extinction as the means for gene
perturbation, as well as the reduced sensitivity of single-cell RNA sequencing
technology, we are intrigued by the possibility of adapting our GRN estimation
methodology to larger and potentially more complex networks. The extension of TopNet
to such data would allow one to move beyond associative analyses and identify
non-linear gene interactions underlying cancer cell robustness. Several
computational and statistical challenges will need to be addressed in this context,
such as the increased feature space, decreased sensitivity, and unidirectional
perturbations. In the near term, targeted Perturb-seq (Schraivogel et al., 2020), in which some of these
challenges are lessened, is ideally suited for initial methods development and
testing. Our general modeling approach has the potential to extract rich biological
interactions from these data and open new avenues of biological inquiry.The complexity and strength of the CRG network revealed by TopNet are both
remarkable and unexpected. CRGs comprise a diverse set of genes functioning in a
multitude of diverging pathways involved in regulating cell signaling, metabolism,
gene expression, adhesion, and survival. In addition, biologically relevant genetic
interactions between any of the 20 CRGs investigated here, to our knowledge, have
not been reported. Our observations thus suggest a fundamental role for genetic
networks in supporting the robustness of the malignant state and reveal network
stability per se as a target for cancer intervention. Combination perturbations that
disrupt network robustness may be needed for successful intervention. One could
imagine that such an approach would be advantageous in the context of emerging
treatment resistance. In addition, the examination of gene regulatory network
architecture also has the potential to uncover an unexpected wealth of cancer cell
vulnerabilities controlled by non-mutated genes and ultimately identify druggable
targets to guide novel intervention strategies. For example, targeting a metabolic
enzyme such as Sms could serve as a convenient alternative to intervention at the
level of HoxC13, as inhibition of Sms expression is similarly effective in tumor
inhibition as HoxC13 reactivation. In addition, molecular mechanisms underlying the
Sms-HoxC13 interaction would be worth exploring further. TopNet, as shown here, thus
can effectively be used for the identification of suitable targets accessible to
pharmacological intervention in a largely non-druggable GRN environment.
STAR★METHODS
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources and reagents should
be directed to and will be fulfilled by the lead contact, Matthew McCall
(matthew_mccall@urmc.rochester.edu).
Materials availability
Plasmids generated in this study are available upon request.
Primer sequences and TaqMan probe sets are tabulated in Tables S1–S3.
There are restrictions to the availability of YAMC cells due to
materials transfer agreement.All qPCR data are available in this paper’s supplemental
information. Tumor volume data reported in this paper
will be shared by the lead contact upon request.All original code to reproduce all statistical analyses is
available in the crgnet R package included in this paper’s
supplemental information as Methods S1 and the
ternarynet R/Bioconductor package, available at: https://bioconductor.org/packages/release/bioc/html/ternarynet.html.The pseudocode for the network modeling algorithm is
supplied in Methods S3, while a reproducible workflow of the
analyses in this paper is included as the vignette of the crgnet
package (Methods
S1).Any additional information required to reanalyze the data
reported in this paper is available from the lead contact upon
request.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Cells
Low-passage polyclonal young adult mouse colon (YAMC) cells were
originally derived from males and females of the Immorto-mouse
(H-2Kb /tsA58 transgenic mouse) and gifted to the Land
laboratory by a gift from R. Whitehead and A.W. Burgess (D’Abaco et al., 1996; Whitehead et al., 1993). YAMC cells express
temperature sensitive simian virus 40 large T (tsA58) under the control of
an interferon γ inducible promoter were maintained at the permissive
temperature (33°C) for large T in the presence of interferon γ
to support conditional immortalization in vitro. This
permits expansion of the cells in tissue culture. In contrast, exposure of
YAMC cells to the non-permissive temperature for large T (39°C) in
the absence of interferon g leads to growth arrest followed by cell death,
indicating the absence of spontaneous immortalizing mutations in the cell
population. The cells were cultured on Collagen IV-coated dishes
(1μg/cm2 for 1.5 hr at room temp; Sigma) in RPMI 1640
medium (Invitrogen) containing 10% (v/v) fetal bovine serum (FBS) (Hyclone),
1 × ITS-A (Invitrogen), 2.5 μg/ml gentamycin (Invitrogen), and
5 U/ml interferon γ (R&D Systems). All experiments testing the
effects of RasV12 and p53175H were carried out at the
non-permissive temperature for large T function (39°C) and in the
absence of interferon γ.
Animals for allograft tumor assays
Tumor formation was assessed by sub-cutaneous injection of perturbed
or control cells into nulliparous female adult CD-1 nude mice
(Crl:CD-1-Foxn1nu, Charles River Laboratories, purchased at 6
– 8 weeks of age). Animals were housed 3 – 5 per cage in
micro-isolator conditions, necessitated by their immunocompromised status.
Cells were implanted at a dosage of 5×105 per injection in
RPMI 1640 media with no additives. For each gene perturbation three to six
biological replicates were performed; 4 – 8 injections were performed
for each replicate of perturbed cells and vector controls. Two cell
implantations were performed on each mouse in the left and right flanks;
animals received exclusively perturbed cells or vector control cell
populations. Tumor size was measured by caliper at 2, 3 and 4 weeks
post-injection. Tumor volume was calculated by the formula volume =
(4/3)πr3, using the average of two radius
measurements. Tumor reduction was calculated based on the difference of
average tumor volume following each gene perturbation as compared to the
vector control tumors from the same batch of perturbations. Statistical
significance of difference in average tumor size was calculated by the
Wilcoxon signed-rank test with Benjamini-Hochberg multiple testing
correction (Hochberg and Benjamini,
1990).
METHOD DETAILS
Young adult mouse colon (YAMC) cells and derivation of transformed
cells with multiple oncogenic lesions were derived and are used as described
elsewhere (McMurray et al., 2008;
Whitehead et al., 1993). Briefly,
polyclonal cell populations harboring mutant forms of p53
(p53175H) and Ha-Ras (RasV12) (abbreviated as
mp53/Ras) were derived by retroviral infection of low-passage polyclonal
YAMC cells. YAMC cells were cultured on Collagen IV-coated dishes (1
μg/cm2 for 1.5 hr at room temp; Sigma) in RPMI 1640
medium (Invitrogen) containing 10% (v/v) fetal bovine serum (FBS) (Hyclone),
1 × ITS-A (Invitrogen), 2.5 μg/ml gentamycin (Invitrogen), and
5 U/ml interferon γ (R&D Systems). All experiments testing the
effects of genetic perturbations were carried out at the non-permissive
temperature for large T function (39°C) and in the absence of
interferon γ.
Genetic perturbation of gene expression
cDNAs expressed via pBabe retroviral vectors or pLenti6 lentiviral
vectors and shRNA delivered via pSuper-retro retroviral vectors or pLKO
lentiviral vectors were used to generate gene perturbations. These were
tested by comparison of RNA expression levels in empty vector-infected cells
and cells subjected to gene perturbation via SYBR Green qPCR with
gene-specific primers.cDNA clones were obtained from the IMAGE consortium collection,
distributed by Open Biosystems, or PCR-cloned from murine cDNA using
sequence-specific primers except for murine Jagged2 (Jag2) and
Notch3-intracellular domain (Notch3-ICD) (gifts of Dr. L. Milner). All cDNAs
were sequence-verified prior to use and were cloned into the retroviral
vector pBabe-puro (Morgenstern and Land,
1990).shRNA molecules were designed using an algorithm (Yuan et al., 2004), available at http://sirna.wi.mit.edu/home.php, for
cloning into the pSuper-retro vector (Oligoengine) or purchased as clones in
the pLKO vector. Target sequences for pSuper-retro cloning were synthesized
as forward and reverse oligonucleotides (IDT), which were annealed and
cloned into the vector. For each upregulated gene, we identified two or
three independent shRNA target sequences yielding at least 50% reduction in
gene expression with the goal to guard against off-target effects (Figure S2). For this
purpose between four and six shRNA targets for each gene were tested. In the
case of SerpinB2, only one shRNA target sequence yielded appropriate levels
of knock-down, reducing mRNA expression to levels comparable to those in
YAMC cells.Retroviruses were produced following transient transfection of
ΦNX-eco cells, while lentiviruses were produced following transient
transfection of 293T cells. Infections were carried out in media with 8
μg/mL polybrene. Selection with 5 μg/mL puromycin, and where
applicable, 200 μg/mL hygromycin B, was used to generate polyclonal
populations of cells stably expressing the indicated cDNAs and/or shRNAs. To
test reproducibility of the effects of CRG gene perturbations on tumor
formation 2–4 independent replicates of such cell populations were
derived (Figure
S2).For combined perturbations, cDNA or shRNA for one gene in the pair
was sub-cloned into the appropriate pBabe-hygro or pSuper-retro-hygro
retroviral vector, allowing for consecutive, independent selection for each
gene perturbation introduced.
Quantitation of gene perturbation
The efficiency of gene perturbations was tested by comparison of RNA
expression levels in empty vector-infected mp53/Ras cells and cells
subjected to gene perturbation. Re-expression or knock-down was also
compared with the respective levels of RNA expression in YAMC control cells.
For collection of RNA, mp53/Ras cells were grown at the 39°C for 2
days, followed by serum withdrawal for 24 hr. Total RNA was extracted from
cells following the standard RNeasy Mini Kit protocol for animal cells, with
on-column DNase digestion (QIAGEN).SYBR Green-based quantitative PCR was run using cDNA produced as
described below for TLDA, with 1x Bio-Rad iQ SYBR Green master mix, 0.2
μM forward and reverse primer mix, with gene-specific qPCR primers
for each gene tested. Primers were identified using the Primer Bank
database15, available at https://pga.mgh.harvard.edu/primerbank/index.html or
designed using the IDT PrimerQuest tool (https://www.idtdna.com/Scitools/Applications/Primerquest/).
Differential gene expression was calculated by the ∆∆Ct
method, described below, using RhoA as an endogenous reference gene.
Reactions were run on the iCycler (Bio-Rad), as follows: 5 min at
95°C, 45 cycles of 95°C for 30 s, 58 to 61°C for 30 s,
68 to 72°C for 45 s to amplify products, followed by 40 cycles of
94°C with 1°C step-down for 30 s to produce melt curves.As expected, the magnitude of perturbation varies between cDNAs and
replicates, and falls into the following groups. For tumor-inhibitory CRGs,
all replicates express cDNAs at levels below, at or moderately above YAMC
mRNA expression levels with the exception of Pvrl4, for which we cannot
exclude the possibility that its tumor inhibitory effects are due to
overexpression of the cDNA. For non-tumor-inhibitory CRGs, cDNA expression
levels were found at or above the levels of the corresponding YAMC mRNAs
(Figure
S2).
Allograft assays
Tumor formation was assessed by sub-cutaneous injection of cells
into CD-1 nude mice (Crl: CD-1-Foxn1nu, Charles River
Laboratories). Murine mp53/Ras cells were grown at 39°C for 2 days
prior to injection and implanted via sub-cutaneous injection of
5×105 mp53/Ras in RPMI 1640 with no additives. For
each gene perturbation three to six biological replicates were performed; 4
– 8 injections were performed for each replicate of perturbed cells
and vector controls. Tumor size was measured by caliper at 2, 3 and 4 weeks
post-injection. Tumor reduction was calculated based on the difference of
average tumor volume following each gene perturbation as compared to the
vector control tumors from the same batch of perturbations. Statistical
significance of difference in average tumor size was calculated by the
Wilcoxon signed-rank test with Benjamini-Hochberg multiple testing
correction (Hochberg and Benjamini,
1990).
TLDA QPCR
Expression values for each of the CRGs were derived from TaqMan
Low-Density QPCR Array (TLDA) data. The TaqMan Low-Density Array (Applied
Biosystems) consists of TaqMan qPCR reactions representing the cooperation
response genes and control genes (18S rRNA, GAPDH) in a microfluidic card. TLDA
were used to measure gene expression patterns following genetic perturbation of
CRGs. To generate cDNA for qPCR analysis, quadruplicate samples of mRNA from
mp53/Ras cells harboring CRG perturbations were isolated and 10 μg/sample
were mixed with 1x SuperScript II reverse transcriptase buffer, 10 mM DTT, 400
μM dNTP mixture, 0.3 ng random hexamer primer, 2 μL RNaseOUT RNase
inhibitor and 2 μL of SuperScript II reverse transcriptase in a 100
μL reaction (all components from Invitrogen). RT reactions were carried
out by denaturing RNA at 70°C for 10 minutes, plunging RNA on to ice,
adding other components, incubating at 42°C for 1 hour and heat
inactivating the RT enzyme by a final incubation at 70°C for 10
minutes.For each sample, 82 μL of cDNA was combined with 328 μl of
nuclease free water (Invitrogen) and an equal volume of TaqMan Universal PCR
Master Mix No AmpErase UNG (Applied Biosystems). The mixture was loaded into
each of 8 ports on the card at 100 μL per port. Each reaction contained
forward and reverse primer at a final concentration of 900 nM and a TaqMan MGB
probe (6-FAM) at 250 nM final concentration. The cards were sealed with a TaqMan
Low-Density Array Sealer (Applied Biosystems) to prevent cross-contamination.
The real-time RT-PCR amplifications were run on an ABI Prism 7900HT Sequence
Detection System (Applied Biosystems) with a TaqMan Low Density Array Upgrade.
Thermal cycling conditions were as follows: 2 min at 50°C, 10 min at
94.5°C, 40 cycles of 97°C for 30 s, and annealing and extension at
59.7°C for 1 minute. Each individual replicate cDNA sample was processed
on a separate card.
QUANTIFICATION AND STATISTICAL ANALYSIS
TopNet: Our approach to topological network analysis
consists of the following specific steps:Non-detect imputation through application of an ECM algorithm to
non-random missing data (McCall et al.,
2014)Calculation of z-scores for each perturbation based on
∆∆Ct values to standardize expression fold changes.Calculation of the probability of up- or downregulation in
response to each perturbation by fitting a uniform/normal/uniform
mixture model to the z-scores calculated in step 2.Identification of ternary network models (Almudevar et al., 2011) which minimize the
L1 distance between the network attractors and the
estimated probabilities from step 3. For the network model presented in
this manuscript there are approximately 2.48 × 10763
potential networks, so we use a specific implementation of replica
exchange Monte Carlo (Swendsen and Wang,
1986) to search the model space in parallel across the nodes
of a high-performance computing cluster.Summarization of network topology by reporting parent-child
relationships that are present in the vast majority, here at least 80
out of 100, of independent network fits.Generation of testable hypotheses regarding interdependencies
among genes in the network via examination of the network topology and
the biological effects of perturbation of the genes in the network, in
this manuscript inhibition of tumor growth in vivo.Several of these steps are described in more detail in the following
subsections.
Preprocessing of gene expression data
Gene expression values were derived using SDS 2.0 and 3.0 software
packages (Applied Biosystems). As previously reported in other datasets, we
observed a strong dependency between the proportion of non-detects (those
reactions failing to produce fluorescence values above a certain threshold)
and the average observed expression value. Non-detects were treated as
non-random missing data and imputed using the R/Bioconductor package
non-detects (McCall et al., 2014).
The R/Bioconductor HTqPCR package (Dvinge and
Bertone, 2009) was used to normalize the data to Becn1
expression, which was shown to have relatively low variability across
replicate control samples, and accounts for much of the technical
variability between samples (detailed in Methods S1).Differential gene expression was calculated by the
∆∆Ct method. Briefly, using threshold cycle (Ct) for each
gene, change in gene expression was calculated for each sample comparison by
the formulae:In order to incorporate uncertainty into subsequent analyses, we
estimate the probability that a gene is up- / downregulated in response to
each perturbation by fitting a uniform / normal / uniform mixture model to
approximate z-scores for each perturbation. This approach is similar to the
probability of expression (POE) algorithm (Parmigiani et al., 2002).
Gene regulatory network reconstruction
The methodology used to estimate the GRN uses the theoretical
framework developed in Almudevar et al.
(2011). One aspect of TopNet’s approach to GRN modeling is
a method of scoring the networks based on a measure of uncertainty in the
differential expression estimates. Specifically, the network models are
scored by summing the absolute value of the differences between
probabilities of up- or downregulation and the attractors and subtracting
the best theoretically possible score. This allows the network model to give
more weight to data points with higher certainty (probabilities closer to 0
or 1). As an added benefit, the ability to produce non-integer network
scores eased transitions between network models and significantly decreased
computational time. Another aspect of TopNet is an improved search algorithm
that uses replica exchange Monte Carlo (Swendsen and Wang, 1986) to parallelize the search across many
nodes of a high-performance computing cluster. This allowed us to avoid
becoming trapped in local optima and to fit larger more complex networks in
substantially less time. Results presented in this manuscript are based on
network fits each run for 1,000,000,000 cycles in parallel across 20
processors with temperatures ranging from 0.001 to 1. We have also examined
the transition functions, attractors, and trajectories all stored in the
fits object, available within Methods S1.
KEY RESOURCES TABLE
REAGENT or RESOURCE
SOURCE
IDENTIFIER
Bacterial and virus
strains
pBabe retroviral vector containing either
puromycin, blasticidin or hygromycin resistance genes
Authors: Conan Kinsey; Vijaya Balakrishnan; Michael R O'Dell; Jing Li Huang; Laurel Newman; Christa L Whitney-Miller; Aram F Hezel; Hartmut Land Journal: Cell Rep Date: 2014-05-01 Impact factor: 9.423
Authors: A Rattner; J C Hsieh; P M Smallwood; D J Gilbert; N G Copeland; N A Jenkins; J Nathans Journal: Proc Natl Acad Sci U S A Date: 1997-04-01 Impact factor: 11.205
Authors: Adam A Margolin; Ilya Nemenman; Katia Basso; Chris Wiggins; Gustavo Stolovitzky; Riccardo Dalla Favera; Andrea Califano Journal: BMC Bioinformatics Date: 2006-03-20 Impact factor: 3.169
Authors: Daniel Schraivogel; Andreas R Gschwind; Jennifer H Milbank; Daniel R Leonce; Petra Jakob; Lukas Mathur; Jan O Korbel; Christoph A Merten; Lars Velten; Lars M Steinmetz Journal: Nat Methods Date: 2020-06-01 Impact factor: 28.547