Reprogramming of cellular metabolism is an emerging hallmark of neoplastic transformation. However, it is not known how the expression of metabolic genes in tumors differs from that in normal tissues, or whether different tumor types exhibit similar metabolic changes. Here we compare expression patterns of metabolic genes across 22 diverse types of human tumors. Overall, the metabolic gene expression program in tumors is similar to that in the corresponding normal tissues. Although expression changes of some metabolic pathways (e.g., upregulation of nucleotide biosynthesis and glycolysis) are frequently observed across tumors, expression changes of other pathways (e.g., oxidative phosphorylation) are very heterogeneous. Our analysis also suggests that the expression changes of some metabolic genes (e.g., isocitrate dehydrogenase and fumarate hydratase) may enhance or mimic the effects of recurrent mutations in tumors. On the level of individual biochemical reactions, many hundreds of metabolic isoenzymes show significant and tumor-specific expression changes. These isoenzymes are potential targets for anticancer therapy.
Reprogramming of cellular metabolism is an emerging hallmark of neoplastic transformation. However, it is not known how the expression of metabolic genes in tumors differs from that in normal tissues, or whether different tumor types exhibit similar metabolic changes. Here we compare expression patterns of metabolic genes across 22 diverse types of humantumors. Overall, the metabolic gene expression program in tumors is similar to that in the corresponding normal tissues. Although expression changes of some metabolic pathways (e.g., upregulation of nucleotide biosynthesis and glycolysis) are frequently observed across tumors, expression changes of other pathways (e.g., oxidative phosphorylation) are very heterogeneous. Our analysis also suggests that the expression changes of some metabolic genes (e.g., isocitrate dehydrogenase and fumarate hydratase) may enhance or mimic the effects of recurrent mutations in tumors. On the level of individual biochemical reactions, many hundreds of metabolic isoenzymes show significant and tumor-specific expression changes. These isoenzymes are potential targets for anticancer therapy.
All tumors share a common phenotype of uncontrolled cell proliferation. To support the
synthesis of biomass components and to generate energy required for cellular growth, cancer cells
have to reshape the regulatory and functional properties of their metabolic networks. Over 80 years
ago, Otto Warburg[1] identified a shift from
oxidative to fermentative metabolism as a common physiological trait of tumor cells. Following this
early insight into cancer metabolism, the main focus of cancer research generally shifted towards
the analysis of signaling, gene-regulatory and genetic perturbations in various tumors[2, 3]. Recently, however,
there has been a resurgence of interest in cancer metabolism[4-6]. An important factor contributing to this
renaissance is the observation that many signaling pathways altered in cancer are key regulators of
the human metabolic network[5]. In addition, the
therapeutic potential of metabolic targets in cancer has also been rediscovered[7, 8].Taking advantage of a large compendium of gene expression profiles that has been
accumulated over the last decade[9, 10], in this study we comprehensively analyzed tumor-induced changes in mRNA
expression of human metabolic genes across 22 diverse cancer types. To minimize confounding
metabolic adaptations that may arise from tissue culture conditions, we analyzed only microarray
data obtained from biopsies of primary tumors. We compared gene expression in tumors and
corresponding normal tissues at several conceptual levels of biochemical organization: at the global
network level, at the level of individual biochemical pathways and at the level of single enzymatic
reactions. The focus on the human metabolic network and the analysis of the large collection of
tumor and normal samples allowed us to gain statistical power and establish significance for many
expression patterns not reported previously.
RESULTS
Global changes in metabolic gene expression
To understand metabolic gene expression in different cancers, we assembled a
comprehensive collection of more than 2500 microarray measurements spanning 22 different tumor types
(Online Methods and Supplementary Table 1).
Although we analyzed only expression data obtained using the most comprehensive human expression
array platform (HG U133 Plus 2.0; Supplementary
Table 2), comparisons of data for the same tumor types obtained from independent studies and
with different microarray platforms (Supplementary
Table 3) showed a high correlation of expression changes (average Spearman’s rank
correlation coefficient = 0.63), confirming the generality of the observed expression patterns (see
Supplementary Fig. 1).Using the assembled expression compendium, we first investigated the global shifts in
metabolic gene expression between and within different cancers and their corresponding normal
tissues. For this analysis we used 1421 human genes assigned to metabolic pathways in the KEGG
database[11]. Using two different measures of
divergence between a pair of expression profiles[12], the Euclidean distance and the correlation distance (Online Methods), we compared
global expression patterns between tumors and normal tissues (Supplementary Table 2). In order to account for
batch effect arising due to variations in laboratory conditions and measurements, the estimated
batch contribution was subtracted from expression distances between expression profiles measured in
different studies (Online Methods).Relative differences between the distributions are consistent for the two metrics of
expression divergence (Fig. 1 and Supplementary Fig. 2). The expression distance
between tumors and corresponding normal tissues
(Tumor-Normal) is significantly larger
than the distance between different samples of the same normal tissues
(Normal-Normal; Mann-Whitney U test
P-value = 10−8; Fig. 1) or between different
samples of the same tumors (Tumor-Tumor;
P-value = 4*10−7). The distance
Tumor-Normal, however, is significantly
smaller than the distance between different tumors
(Tumor-Tumor; P-value =
2*10−7), which in turn is significantly smaller than the distance between
different normal tissues (Normal-Normal;
P-value < 2*10−16). The average expression distance between two different
tumors is ~82% of the average distance between two different normal tissues, while the
distance between a tumor and a corresponding normal tissue is ~63% of the distance between
two different normal tissues. Consequently, although the metabolic expression patterns in different
tumors become more similar than in corresponding normal tissues, the general metabolic expression
program of the original tissue is mostly retained in tumors.
Figure 1
Global differences in metabolic gene expression between tumors and normal tissues. Colors
represent distributions of the Euclidean (RMSD) expression distance between different samples of
identical normal tissues (Normal-Normal,
magenta), different samples of identical tumors
(Tumor-Tumor, cyan), tumors and
corresponding normal tissues
(Tumor-Normal, blue), different tumors
(Tumor-Tumor, green), and different normal
tissues (Normal-Normal, red). The
distributions shown in the figure were binned for display purposes only. Inset summarizes the
average distances between pairs of tissues as a percentage of the average distance between two
different normal tissues.
Expression changes of individual biochemical pathways
We next analyzed the expression changes associated with individual biochemical pathways
defined in the KEGG database[11] (see Supplementary Table 5 for pathway information and
numbering). To identify the patterns of up- and down-regulation for each metabolic pathway, we
determined the significance of its expression changes in the tumor samples relative to the
corresponding normal samples using the Wilcoxon signed-rank test, adjusted for multiple hypothesis
testing (Online Methods). Based on this analysis we calculated the average fraction of tumor samples
in which each metabolic pathway was significantly (FDR-corrected P-value < 0.05) up-regulated
and down-regulated across 22 cancer types (Fig. 2).
To assess the statistical significance of the observed pathway behavior, we computed the null
distributions of () and () values for metabolic pathways highlighted in Fig. 2 and demonstrated statistical significance of the observed patterns (Online
Methods and Supplementary Fig. 3). The
general expression patterns observed with the KEGG pathways were similar to results obtained using
the BioCyc pathway definitions[13] (Supplementary Fig. 4a; Supplementary Table 6), suggesting the robustness
of the results with respect to alternative pathway definitions.
Figure 2
Expression of individual metabolic pathways in tumors. The biochemical pathways defined in the
KEGG database (see Supplementary Table 5
for pathway numbering) are shown in the coordinates of (, horizontal axis) and (, vertical axis), where is the average fraction of tumor samples in which a pathway is
significantly up-regulated, and is the average fraction in which a pathway is significantly
down-regulated. The averages and were calculated across all 22 tumors. The up- (down-) regulation
significance was determined using Wilcoxon signed-rank test (FDR-corrected P-value < 0.05,
see Supplementary Fig. 4b for the same
analysis with FDR = 0.2). Several pathways are highlighted using different colors. The dashed lines
demarcate the region where is less than 20% of and are shown for visualization purposes only. Metabolic pathways
without significant expression changes are primarily clustered on the left of the figure. Pathways
that are often significantly up-regulated (high values) occupy positions in the upper right corner, while pathways that
are primarily down-regulated (high values) occupy positions in the lower right corner. Highly
heterogeneous pathways that show, in different tumors, both significant up- and down-regulation are
clustered on the right near zero on the vertical axis.
As expected, pathways responsible for production of biomass components that are essential
for cell division, such as pyrimidine (18) and purine (16) biosynthesis, are significantly
up-regulated in many tumor samples (Fig. 2). Along with these
two pathways, glycolysis (86) is also significantly up-regulated in many samples, consistent with an
enhanced glucose uptake frequently observed in tumors[14]. Among other pathways displaying frequent and significant up-regulation are
pathways related to protein synthesis (aminoacyl-tRNA biosynthesis (81)) and glycoprotein
biosynthesis (N-glycan biosynthesis (40)). In tumor samples where the expression of the
aforementioned pathways is not significantly changed, the overall metabolic gene expression was
mostly either down-regulated or not significantly changed. Specifically, this is the case for 72% of
tumor samples with no significant change in the glycolysis pathway expression, 84% of tumor samples
with no change in the purine biosynthesis pathway expression, and 88% of tumor samples with no
change in the pyrimidine biosynthesis pathway expression. The down-regulation of the overall
metabolic gene expression is common for tumors originating from human tissues with significant
metabolic functions (see principal component analysis below).In contrast to the biosynthesis pathways, pathways responsible for degradation of
essential amino acids (valine, leucine and isoleucine degradation (22)), cofactors (retinol
metabolism (75)), and fatty acids (9) are frequently and significantly down-regulated.
Interestingly, two metabolic pathways that are also consistently down-regulated across various
tumors are xenobiotic (82) and drug (83) metabolism. These processes are responsible for
detoxification and disposal of compounds foreign to the normal biochemistry of the cell. Several
previous studies have shown that polymorphisms in cytochrome P450 genes correlate with cancer
susceptibility in different types of cancer, including those of the lung, bladder and
breast[15, 16]. Although the specific reasons for the decreased expression of xenobiotic pathways
in cancer need to be further investigated, it is possible that this down-regulation contributes to
the increased sensitivity of tumor cells to chemotherapies.The heterogeneous behavior of the oxidative phosphorylation (15) and the TCA cycle (1)
pathways is also notable (Fig. 2). Interestingly, oxidative
phosphorylation shows the most heterogeneous behavior of all considered metabolic pathways. In
brain, colon, kidney, pancreatic and thyroid cancers genes involved in oxidative phosphorylation are
significantly down-regulated, whereas in breast, leukemia, lung, lymphoma and ovarian cancers these
genes are significantly up-regulated (Supplementary Table 7). This pattern suggests that the role of oxidative phosphorylation is
not universal for all tumors, but possibly reflects the adaptation of different cancers to
tissue-specific physiological conditions such as hypoxia, nutrient availability, or complement of
genetic lesions driving a specific tumor type.We also explored the heterogeneity of metabolic pathway expression across different
samples of the same (or similar) tumor types. Such an analysis (Online Methods and Supplementary Fig. 5) showed that oxidative
phosphorylation gene expression is not only heterogeneous between different tumor types, but also
frequently varies between samples of the same tumor. This observation suggests that the activity of
oxidative phosphorylation is influenced not only by the variability of environments across different
tumor types, but also by the specific physiological conditions and/or genetic composition of
individual tumors in each cancerpatient. In contrast, other metabolic pathways showed similar
expression patterns across different samples of the same tumor.
Correlation between metabolic pathways and signaling genes
We next investigated correlations between the expression of metabolic pathways and
expression of signaling and regulatory genes frequently involved in tumorigenesis. Although several
previous studies[17] demonstrated that correlated
expression patterns usually cannot be equated with regulation causality, i.e. one gene being
regulated by the other, significant correlations could still reveal important functional
relationships. To evaluate expression correlations we used the context likelihood of relatedness
(CLR) method[18], which is based on mutual
information between expression patterns and controls for specificity of each discovered relationship
(Online Methods), to identify significant interactions (Z-score > 2.0, Supplementary Table 8) between the 214
non-metabolic genes annotated in the KEGG signaling/cancer pathways (Supplementary Table 9) and the 87 metabolic
pathways considered in our analysis; significant relationships identified by CLR suggest high mutual
information between expression patterns of the corresponding genes and pathways.The CLR analysis revealed several interesting relationships. The oxidative
phosphorylation pathway has high mutual information with the hypoxia-inducible factor (HIF1A) and
its negative regulator RBX1. Notably, the oxidative phosphorylation expression is anti-correlated
with the expression of HIF1A, and is correlated with the expression of RBX1. The observed
anti-correlation between oxidative phosphorylation and HIF1A suggests that the heterogeneity in the
expression of oxidative phosphorylation genes (Fig. 2) is
likely to be influenced by tumoroxygen availability. In addition, the mutual information between
HIF1A and glycolysis is not high, likely because HIF1A is involved in the up-regulation of
glycolysis specifically under hypoxia, although many tumors show a strong expression of oxidative
phosphorylation and may not be hypoxic. In contrast, there is significant mutual information between
glycolysis and CDC42, a gene essential for cell cycle progression. Glycolysis is also strongly
correlated with expression of RAS and genes from the MAPK pathway which have been previously
implicated in promoting aerobic glycolysis[19].
Apart from glycolysis, CDC42 expression has also high mutual information with other pathways
essential for fast cellular growth (such as purine, pyrimidine, and amino acids biosynthesis). On
the other hand, CDC42 expression is not correlated with the expression of oxidative phosphorylation,
suggesting that in fast growing tumor cells glucose fermentation dominates. This observation agrees
with the results of the principal component analysis described below.
Principal component analysis of pathway expression changes
Individual metabolic pathways do not function in isolation. In contrast, they display
highly correlated and interdependent patterns of gene expression. Therefore, we used principal
component analysis (PCA)[20] to better understand
the joint behavior of metabolic pathways in cancer (Table 1).
To reduce noise associated with heterogeneous expression of individual pathways, we considered the
expression changes in the space of nine meta-pathways representing major metabolic processes (Online
Methods). Combined, the first three principal components were able to capture approximately 85% of
the meta-pathway expression variance.
Table 1
Principal component analysis (PCA) of gene expression in major metabolic processes. The PCA was
performed using the average expression changes of genes forming nine metapathways representing major
biochemical processes. The pathway weights indicate the relative contribution of each meta-pathway
to the principal components; weights with identical signs indicate correlated contributions of
pathways to a component, while weights with opposite signs indicate anti-correlated contributions.
The table show the PCA weights of each meta-pathway for the first three principal components. The
first three principal components explain ~62%, ~16% and ~7% of variance in
expression data, respectively
Variables
Number of genes involvedin each pathway
Weights in the 1stcomponent
Weights in the 2ndcomponent
Weights in the 3rdcomponent
Oxidative phosphorylation
135
−0.40
0.21
0.67
Glycolysis
24
−0.33
0.65
−0.01
Citric acid cycle
21
−0.50
−0.10
0.02
Amino acids biosynthesis
70
−0.27
−0.11
−0.17
Fatty acids and lipids biosynthesis
66
−0.28
0.05
0.09
Nucleotides and nucleosides biosynthesis
54
−0.35
0.34
−0.67
Amino acids degradation
123
−0.31
−0.40
−0.15
Fatty acids and lipids degradation
80
−0.22
−0.35
0.17
Nucleotides and nucleosides degradation
9
−0.26
−0.34
−0.04
Proportion of variance explained by eachcomponent
− 0.62
0.16
0.07
The first principal component accounts for ~62% of the variance in the
meta-pathway expression changes. As all pathway weights for this component have the same sign and
similar values, it represents an approximately uniform shift in the overall expression of metabolic
genes. The projection of cancer samples onto the plane defined by the first and second principal
components (Supplementary Fig. 6) shows
that tumors of the digestive system (colon, kidney and liver) have high positive shift along this
component, suggesting an overall decrease in metabolic gene expression. In contrast, other tumors
(for example, cervix and lymphomas) show an overall increase in metabolic gene expression. It is
likely that the observed shifts along the first principal component reflect, at least to some
extent, the loss of specific metabolic functions required by the corresponding normal tissues, and
the switch to a metabolic program primarily focused on growth and proliferation. This may account
for the overall decrease in expression of metabolic genes observed in tumors of the gastrointestinal
system that normally have high metabolic gene expression unique to these differentiated tissues.Shifts along the second component, explaining ~16% of the expression variance,
involve a change in the expression of glycolysis and nucleotide biosynthesis with a concomitant
opposite change in the expression of three catabolic pathways. Because an increased rate of
nucleotide biosynthesis is especially important during ribosome biogenesis and chromosomal
duplication, our results suggest that dividing cells appear to increasingly rely on glycolysis.
Oxidative phosphorylation is also associated with the second component, although with a
significantly smaller weight than glycolysis (0.21 versus 0.65). Consequently, along this component
glycolysis occurs concurrently with oxidative phosphorylation. Shifts along the third principal
component, explaining ~7% of the variance, primarily involve a strong change in the
expression of oxidative phosphorylation with a concomitant opposite change in nucleotide
biosynthesis. Consequently, a strong up-regulation of oxidative phosphorylation along this component
is likely to be associated with slower growth rates.
Individual biochemical reactions and isoenzymes
Next, we focused on expression changes associated with individual biochemical reactions,
which form the most basic level in hierarchical organization of the human metabolic network. We used
2,307 reactions, each associated with at least one known enzyme (Online Methods) in a model of human
metabolism[21]. In the human metabolic network and
in the networks of other organisms[22], a given
biochemical reaction is frequently catalyzed by several different isoenzymes. Isoenzymes may be
encoded by separate genes or arise from splice variants of the same gene. In the network model we
used[21], ~30% of metabolic reactions contain
at least two known isoenzymes, and this percentage is even higher (~40%) for the reactions of
central carbon metabolism. Different kinetic and regulatory properties of isoenzymes are often
fine-tuned to meet specific metabolic requirements of various human tissues[22]. Owing to metabolic demands and constraints different from
those of native tissues, it is likely that tumors might preferentially express isoenzymes that
facilitate survival and uncontrolled proliferation[23,
24].The heterogeneity of isoenzyme expression across tumors is apparent from the analysis of
central metabolism (Fig. 3). Although genes encoding glycolytic
enzymes are frequently up-regulated in tumors, some isoenzymes are down-regulated in specific
cancers. Gene expression is significantly increased for key enzymes of the pentose phosphate pathway
(PPP), including both the oxidative and non-oxidative branches. Lactate dehydrogenase (LDH) is also
strongly up-regulated, consistent with the high level of lactate production observed in many
tumors[25]. Frequent down-regulation of the PDH
complex likely contributes to a decreased flux of pyruvate into the TCA cycle observed in many
tumors. The enzymes essential for purine and pyrimidine synthesis, as well as the glutathione
synthetase (GSS)[26], are strongly up-regulated.
Although glutaminase (GLS) - an enzyme important for the TCA cycle anaplerosis - is generally
down-regulated, it has been demonstrated that this enzyme is strongly up-regulated
post-transcriptionally by the MYC-mediated suppression of miR-23a/b[27]. Notably, a recent study[28] suggested that an alternative route for the glutamine-to-glutamate transformation,
perhaps through nucleotide biosynthesis amidotransferases, may play an important role in the
glutamine-dependent anaplerosis. This hypothesis is consistent with a strong up-regulation of the
corresponding enzymes (PPAT and CAD) across tumors.
Figure 3
Tumor-induced mRNA expression changes for individual biochemical reactions in central metabolism.
(a) Each metabolic reaction is marked with the number of tumors (out of 22 considered
in our analysis) in which at least one isoenzyme catalyzing the corresponding reaction is
significantly (FDR-corrected P-value < 0.05) up-regulated (red) and down-regulated (blue).
(b) Reactions that are significantly up-regulated (red triangles) or down-regulated
(blue triangles) when all isoenzymes and members of the corresponding protein complexes are
considered together across all tumors (deep red or deep blue, FDR-corrected P-value < 0.05;
light red or light blue, FDR-corrected P-value < 0.1). If unmarked, no statistically
significant change in mRNA expression was detected.
We investigated changes in relative isoenzyme expression using the Kullback-Leibler (KL)
divergence (Online Methods); the KL divergence is an information-theoretic measure used to quantify
the difference between two probability distributions. For each biochemical reaction the KL
divergence was used to measure shifts in the distribution of isoenzyme expression between tumors and
corresponding normal tissues. This analysis demonstrated that, on average, the relative expression
patterns of isoenzymes are about two times more similar for different samples of identical normal
tissues than for different samples of identical tumors (Fig.
4a). But more importantly, both of these distances are significantly smaller than the average
distance between isoenzyme expression patterns in tumors and corresponding normal tissues
(Mann-Whitney U test P-value < 2*10−16). This suggests that for many
biochemical reactions neoplastic transformation leads to a significant shift in the relative
expression of isoenzymes.
Figure 4
Cancer-induced changes in relative isoenzyme expression. (a) The Kullback-Leibler
(KL) divergence was used to characterize differences in the relative expression of isoenzymes for
all biochemical reactions with multiple isoenzymes. Colors represent distributions of the KL
divergence in isoenzyme expression between different samples of identical normal tissues
(Normal-Normal, blue), different samples
of identical tumors (Tumor-Tumor, red),
and tumors and corresponding normal tissues
(Tumor-Normal, green). Inset summarizes
the average KL divergences between pairs of tissues as a percentage of the average KL divergence
between different samples of identical normal tissues. (b) Relative expression of the
aldolase isoenzymes for kidney, liver, stomach, brain (GBM) tumors and the corresponding normal
tissues.
The human aldolase is a notable example of an enzyme with perturbed expression patterns
in tumors (Fig. 4b). The enzyme has three main isoforms A, B
and C. Although aldolase A (ALDOA) is preferentially expressed in muscle cells, it is also strongly
expressed in most other human tissues. Aldolase B (ALDOB) is preferentially expressed in the liver
and aldolase C (ALDOC) in the brain. The expression analysis shows that the expression of ALDOA,
relative to the other aldolase isoenzymes, significantly increases in tumors. Notably, ALDOA is also
highly expressed in developing embryos[29], and
therefore may be particularly suitable for metabolic requirements during fast cell division. Indeed,
kcat value for ALDOA is significantly higher than that of the other isoenzymes[30].Another example of an enzyme with perturbed expression patterns is aconitase. Our
analysis suggests that the citrate efflux from the TCA cycle is likely to be enhanced in cancers by
frequent down-regulation of the mitochondrial isoform of aconitase (ACO2) (Fig. 3a,b). Cytosolic citrate is used to generate acetyl-CoA, an important precursor
required for many biosynthetic reactions involving lipogenesis[31]. Inhibition of the mitochondrial aconitase in normal human tissues[32] and yeast[33] was previously shown to significantly increase the TCA citrate efflux. The strong
up-regulation of the ATP citrate lyase (ACL) across tumors (Fig.
3) provides additional support for the idea that these changes promote fatty acid
biosynthesis in tumors. A recent study showed that an important route for the synthesis of lipogenic
acetyl-CoA under hypoxia[34] is through reductive
metabolism of α-ketoglutarate by cytosolic isocitrate dehydrogenase (IDH1) and cytosolic
aconitases (ACO1/ACO3). This pathway is also supported by the observed expression patterns because
in contrast to the mitochondrial aconitase, the cytoplasmic aconitases are frequently up-regulated
in specific cancers (Fig. 3a), and similar patterns are
observed for IDH1 (see below).To identify specific isoenzymes with frequently perturbed expression profiles, we
calculated, for each isoenzyme in every biochemical reaction, the number of tumors in which the
fractional expression of the isoenzyme among all isoenzymes catalyzing the same reaction is
significantly up-regulated (Online Methods). After correcting for multiple hypothesis testing
(Online Methods), 919 isoenzymes were relatively up-regulated in at least one tumor type, and 322
were up-regulated in more than 25% of the 22 tumor types considered in our analysis (Supplementary Table 12).
Expression of enzymes with recurrent tumor mutations
We next investigated expression changes for metabolic genes with known tumor-associated
mutations. Recent sequencing studies have identified recurrent mutations in several genes associated
with the TCA cycle[35, 36]. Heterozygous somatic mutations in two isoenzymes of isocitrate dehydrogenase
(cytoplasmic IDH1 and mitochondrial IDH2) are frequently detected in gliomas and acute myeloid
leukemia (AML). These gain-of-function mutations affect the IDH active site, and make it possible
for the mutated enzymes to catalyze the conversion of α-ketoglutarate to 2-hydroxyglutarate
(2HG), which has been proposed to promote cancer development[37]. Our analysis reveals that IDH1 and IDH2 isoenzymes are frequently up-regulated in
cancers (Fig. 3b), but the expression of the other isocitrate
dehydrogenase isoenzyme IDH3 (not commonly mutated in tumors) is not significantly perturbed. A
detailed analysis of IDH expression across cancers (Supplementary Table 13) demonstrated that the up-regulation P-values of IDH1 and
IDH2 for the three brain cancers and AML are among the five most significant of all considered tumor
types. Recent sequencing efforts[38] also
demonstrated the presence of similar IDH active site mutations in peripheral T-cell lymphoma,
another tumor in our study with significant up-regulation of IDH expression (Supplementary Table 13).Germline and somatic loss-of-function mutations in fumarate hydratase (FH) and three
subunits of succinate dehydrogenase (SDHB, SDHC, SDHD) are also observed in several tumors including
renal cell carcinoma (RCC)[36, 39]. These deleterious mutations lead to the accumulation of the metabolites
fumarate and succinate that regulate hypoxia-inducible factor (HIF) protein levels and chromatin
state to influence tumor growth[40, 41]. We found that the SDH subunits (SDHB, SDHC, SDHD) and FH are strongly
down-regulated specifically in RCC (Supplementary
Tables 14 and 15). The only cancer
in our analysis with a more significant down-regulation is colorectal cancer, in which decreased
expression of SDH was reported previously[42].
Although no somatic mutation in SDH or FH has been observed in colorectal cancer[43, 44], the
significant decrease in their expression, similar to deleterious mutations in other tumors, is
likely to cause mitochondrial efflux of the tumor-promoting TCA cycle intermediates and contribute
to tumorigenesis.
Analysis of TCA cycle metabolites in colon cancer
To confirm our computational prediction about the TCA cycle intermediates in colon
cancer, we measured and analyzed concentrations of specific metabolites from 10 colon cancerpatients. The metabolite levels were obtained using GC/MS or LC/MS (Online Methods) and contained
matched tumor and normal samples from each patient.Consistent with significant down-regulation of oxidative phosphorylation pathway genes
(Wilcoxon signed-rank test P-value = 10−9, Supplementary Table 7) and down-regulation of the PDH complex (P-value = 0.02) that
controls the majority of glucosecarbon flux into the TCA cycle, there is a significant decrease in
the citrate concentration (Wilcoxon signed-rank test P-value = 0.01) in tumor samples and a
concomitant increase in the lactate concentration (P-value = 0.001). Despite a large decrease in the
citrate concentration (average decrease ~65%, median ~90%), the average concentration
of a downstream metabolite succinate is only 33% lower than normal and the average concentration of
fumarate is more than 50% higher than in normal samples (P-value = 0.03). This pattern of
concentration changes is consistent with the significant down-regulation of the FH and SDH enzymes
observed in colon cancer expression profiles. Such a down-regulation should lead to a significant
increase, relative to the available citrate, in the concentration of their substrates fumarate and
succinate. Notably, it was previously demonstrated[40] that even a small increase in fumarate concentration is enough to stabilize HIF1A
by inhibition of the α-ketoglutarate-dioxygenases regulating its degradation[41]. The average increase in fumarate concentration
(~50%) was about half of the amount observed previously for bi-allelic deletions of the FH
enzyme (~90%)[40]. For four patients in our
samples, fumarate concentration was >50% higher than in matched normal samples and for three
it was >100% higher. Consequently, the expression changes we observed should mimic the
effects of cancer-associated heterozygous FH mutations in a substantial fraction of colon cancerpatients.
DISCUSSION
Reprogramming of the metabolic network is now considered to be a hallmark of neoplastic
transformation[2]. An overarching conclusion of our
study is that cancer-induced changes in the expression of metabolic genes are very heterogeneous
across different tumor types, i.e. there is no uniform metabolic transformation associated with all
tumors. We observe heterogeneous behavior at all levels of biochemical organization, from global
expression patterns to metabolic pathways to individual reactions and corresponding isoenzymes. The
heterogeneous behavior of cancer metabolism is reminiscent of the high variability observed between
tumors in terms of genetic and expression changes in signaling and regulatory pathways[3].Notably, despite the heterogeneity between cancers, the metabolic expression changes
associated with individual tumors are not random. On the contrary, many of the observed changes are
reproducible in independent samples of the same tumors. We can discern several principles unifying
the observed tumor-induced expression perturbations. First, tumors often retain a significant
imprint of the metabolic expression patterns present in the corresponding native tissues. This may
be a consequence of similar local environments or indicate a relative rigidity of the metabolic
expression program established in the original tissue. Such behavior is conceptually similar to the
minimization of metabolic adjustments (MOMA) principle observed in microbial metabolism following
genetic perturbations[45]. Second, a large fraction
of the variance in the expression of major biochemical processes can be rationalized in terms of
several principal components, representing important expression modes for key metabolic processes.
Although, in agreement with physiological studies[14,
46], we do not observe universal up- or down-regulation
for genes associated with oxidative phosphorylation, the collective expression changes along the
second and third principal components suggest that fast-growing cells increasingly rely on glucose
fermentation. Third, we find that many hundreds of isoenzymes show significant and tumor-specific
expression changes. A substantial fraction of these changes are likely to be functionally important
and, at least in some cases, mimic (as in the case of SDH and FH) or possibly enhance (as in the
case of IDH) the effects of recurrent tumor-promoting genetic mutations.Beyond understanding of tumor-induced expression changes, we believe that our analysis
has important implications for the development of anticancer therapeutics. Functionally important
isoenzymes with cancer-specific expression changes can potentially serve as drug targets. The
possibility of targeting specific isoenzymes, such as GLS1[8] and PKM2[7], has already been
demonstrated, but our analysis suggests that many other potential targets may be pursued in a
similar way. Due to the tumor-specific nature of the observed expression patterns, such targeting
will require a focused analysis and understanding of essential metabolic transformations in each
specific cancer type.
ONLINE METHODS
Microarray expression datasets
Published gene expression datasets were assembled from the GEO[9] and ArrayExpress[10]
databases (Supplementary Table 1). Unless
specified otherwise, we analyzed only expression data obtained using the most comprehensive human
expression array platform (HG U133 Plus 2.0; Supplementary Table 2). For calculations involving global network properties and comparisons
of expression data between different studies (Fig. 1), samples
from all datasets were processed together. For all other calculations, tumor and normal samples from
the same study were processed together. The affyQCReport package from Bioconductor (www.bioconductor.org) was used to search for
poor quality chips. For GeneChip arrays that passed Quality Control (QC) checks, we used the GCRMA
algorithm[47] from Bioconductor to perform quantile
normalization and extract gene expression values on the log2 scale.
Calculation of differential expression (DE) for metabolic genes
Separately for each dataset, the Bioconductor method limma[48], which is based on a modified t-statistic, was used to analyze differences
between tumor samples and corresponding normal samples. Using the method we calculated the
differential expression for each metabolic gene on the log2 scale. The differential expression
P-values were adjusted for multiple hypothesis testing using Benjamini and Hochberg’s
method[49], controlling False Discovery Rate (FDR)
at 5%.
Calculation of the global divergence between a pair of expression profiles
Two different measures of divergence between a pair of expression profiles were used in
our study: (1) The Euclidean distance, , where x and
y are the expression of gene i over two expression
profiles with p and q samples (x1,
x2 ,…, x ),
(y1, y2 ,…,
y ), n = 1421 is the number of genes assigned to at
least one metabolic pathway in the KEGG database, (2) The correlation-based distance
d = 1–r(Average log 2
x), Average(log2
y)), where r is the Spearman’s rank correlation coefficient
between average log2 expression values of corresponding genes in the two expression profiles.When comparing datasets across different studies it is important to consider batch
effect arising due to variations in laboratory conditions and measurements. To explore and address
batch effect, we collected a set of microarray expression data for the same tissues/tumor types from
multiple independent studies (Supplementary Table
4). All samples in Supplementary Table
4 were processed and normalized together. To estimate the influence of the batch effect, we
calculated d1, the average expression distance between tumors
(Tumor) and corresponding normal tissues
(Normal) measured in different studies, and
d2, the average expression distance between tumors
(Tumor) and corresponding normal tissues
(Normal) measured in the same studies. The difference
( d1 – d2 ) represents the average
batch effect due to comparisons across different studies. To account for the batch effect the
difference ( d1 – d2 ) was subtracted
from all expression distances calculated between different studies.
Identification of metabolic pathways with significant expression changes
We used two different approaches to identify metabolic pathways with significant
expression changes. The two approaches resulted in very similar results. In the first approach,
which was used for all calculations presented in the paper, for each gene a, we
calculated its expression change in tumor sample i relative to the corresponding
normal samples, , where is the expression in tumor sample i, and
y is the expression in the s corresponding normal
samples (y1 ,y2 ,…,
y ). Wilcoxon signed-rank test of ΔE for all
genes within a metabolic pathway was then used to determine the significance of up- or down-
regulation of the pathway in that tumor sample. In the second approach, for each gene
a, we calculated the z-score of its expression in tumor sample i
relative to the distribution of its expression values in the s corresponding normal
samples, , where is the standard deviation. Wilcoxon signed-rank test of
z was then used to determine the significance of up- or down- regulation of each
pathway in that tumor sample. The pathway expression heterogeneity based on the second approach is
shown in Supplementary Fig. 4c.
Statistical significance of the observed metabolic pathway expression patterns
We used randomized expression data to assess the statistical significance of the
reported pathway expression patterns (Fig. 2). To generate the
null distributions for the () and () values we used the real expression data and randomly permuted
metabolic gene labels while preserving the pathway sizes. We then calculated the null distributions
using the same procedure as the one applied to the real data. Supplementary Fig. 3 shows the null distributions
for the 10 top-regulated pathways (pathways with highest () values in Fig. 2) based on 1000
random permutations of the expression data.
Pathway expression heterogeneity across tumor samples of the same tumor type
To investigate the pathway expression heterogeneity across tumor samples of the same
tumor type, we introduced a pathway-specific heterogeneity metric , where n is the fraction of tumor samples of a certain
tumor type in which the pathway is significantly up-regulated, and m is the
fraction of samples in which the pathway is significantly down-regulated. According to this
definition, for high H values the pathway shows consistent expression changes
across different samples of the same tumor type, i.e. the pathway is mostly up-regulated or mostly
down-regulated. On the other hand, for small H values the pathway expression is
variable, i.e. in some tumor samples the pathway is significantly up-regulated, while in other
samples of the same tumor type the pathway is significantly down-regulated. The distribution of
H values across 22 tumor types or 16 tumor types of different tissue-of-origin for
10 top-regulated pathways (pathways with highest (n+m) values), is
shown in Supplementary Fig. 5. The 16
tumors types of different tissue-of-origin were obtained from 22 tumor types by considering samples
of the three types of brain cancers, the two types of breast cancers and the four types of lymphomas
together, respectively.
Calculation of significant relationships between metabolic pathway expression and expression
of non-metabolic cancer/signaling genes
The context likelihood of relatedness (CLR) method[18] is based on mutual information between expression profiles. CLR was used to
identify significant relationships between 214 non-metabolic cancer/signaling genes annotated in the
KEGG database and the 87 KEGG metabolic pathways (Supplementary Table 5). The set of 214 non-metabolic cancer/signaling genes was assembled
using the following two criteria: (1) genes either from the 14 KEGG cancer or 25 KEGG signaling
pathways (see Supplementary Table 9), and
(2) not within any of the 87 KEGG metabolic pathways. For each gene a in each tumor
sample i, the expression change was calculated. And the mutual information (MI) between each
non-metabolic cancer/signaling gene i and each metabolic pathway j
was calculated across all tumor samples in our study: , where n is the number of genes within the
a pathway j. All mutual information values were computed using 10
bins of ΔE ; the calculated values were not sensitive to the exact number of
bins used. The CLR interaction Z-score for each gene i and pathway
j pair was calculated using (I) the z-score (z )
of MI relative to the distribution of
{MI,
MI,…,MI
}, and (II) the z-score ( z ) of MI
relative to the distribution of
{MI1,,MI2,,…,MI214,
}. In Supplementary Table 8 we show the
identified significant relationships (with Z-score > 2.0) for each metabolic pathway.
Principal component analysis
The nine meta-pathways used for the principal component analysis were compiled by
combining genes from corresponding metabolic pathways in the KEGG and BioCyc databases. To perform
the principal component analysis (PCA), we calculated the p-by-q
matrix D for tumor-to-normal expression changes of the meta-pathways, where
p = 466 (the total number of tumor samples in our study) and q = 9
(the number of meta-pathways). We used two different approaches to calculate D. The
two approaches resulted in very similar principal components. In the first approach, the (i,
j)-element of the matrix D is the average gene-specific
expression changes in tumor sample i across n genes
within meta-pathway j: . In the second approach, D is the average
gene-specific z-scores: . Principal components were then obtained using the covariance method,
i.e. we first centered the columns of D by subtracting the column means, and then
calculated a covariance matrix based on D. The covariance matrix was then
diagonalized and the eigenvectors and eigenvalues were calculated.The results of the PCA analysis based on the first approach are shown in Table 1 and the results based on the second approach in Supplementary Table 10. We also explored the
influence of genes shared between meta-pathways on the PCA results. The results obtained when all
overlapping genes were excluded (Supplementary
Table 11) were very similar to the results with all meta-pathway genes.
Human metabolic annotations used for isoenzyme expression analysis
The human metabolic network compiled by Duarte et al.[21] was used for isoenzyme expression analysis. The network
contains 1496 genes, 2712 compartment-specific metabolites, and 3743 internal and exchange
reactions. In the analysis we used 2307 network reactions that are associated with at least one
known metabolic gene. Proteins that are responsible for catalysis of identical reactions and are not
members of the same complex were considered as isoenzymes. In total, the network by Duarte
et al. contains 667 metabolic reactions with at least two isoenzymes.
Calculation of distances between isoenzyme expression patterns
The Kullback-Leibler (KL) divergence was used to quantify the changes in the relative
expression of isoenzymes for pairs of expression profiles. For each sample, the fractional
expression of a particular isoenzyme i was first calculated
(n is the number of isoenzymes catalyzing the reaction
and x is the expression value of the isoenzyme i). The
flexmix package in R was then used to estimate the Kullback-Leibler divergence between the discrete
distributions
{m(f1),m(f2),…,m(f)}
and
{g(f1),g(f2),…,g(f)},
where m(f) and g(f) are the
averages of the two expression profiles over p and q samples
(x1 ,x2 ,…,
x ), (x1 ,x2
,…, x ).
Identification of isoenzymes preferentially expressed in specific tumors
For each considered isoenzyme we used the non-parametric Mann-Whitney U test to
determine the significance of its fractional expression changes in the tumor samples relative to the
normal samples. Specifically, for an isoenzyme i we calculated its fractional
expression among all isoenzymes associated with the same reaction: , where n is the number of isoenzymes catalyzing the
same reaction, and x is the expression value of the isoenzyme
i. We then used the Mann-Whitney U statistic to test the hypothesis that the
distribution of f values for tumor samples associated with a particular
cancer type has significantly larger mean than the distribution of f
values for the corresponding normal samples. All the P-values were FDR-adjusted at 5% considering
the total number of tested hypothesis, 22704 (1032 isoenzymes times 22 cancer types). The isoenzymes
passing the significance threshold (P-value < 0.05) are reported in Supplementary Table 12. We confirmed the
isoenzyme results using an independently collected expression data from the TCGA
consortium[50]; for the confirmation we used four
tumor types from TCGA (glioblastoma multiforme, breast invasive carcinoma, colon adenocarcinoma,
ovarian serous cystadenocarcinoma). Using the same tumor type 70% of the isoenzymes in Supplementary Table 12 showed the same
up-regulation behavior in TCGA as in the dataset analyzed in the paper.
Statistical significance and multiple hypothesis testing
For pathway and isoenzyme calculations involving multiple hypothesis testing, all the
corresponding P-values were adjusted with the BH procedure[49] (using the multtest package in R) to control the false discovery rate (FDR) at
0.05. The FDR-corrected P-values were used to analyze statistical significances, and unless
specified otherwise, significance was reported for the adjusted P-value < 0.05.
Quantitative metabolite profiling of TCA cycle intermediates in colon cancer. Sample
collection and metabolite extraction
Tumors and surrounding grossly normal-appearing tissues were obtained from 10 colon
cancerpatients after surgical treatment. The excised tissues were immediately stored at
−80°C. Samples were extracted and prepared for analysis using Metabolon’s
standard solvent extraction method. The extracted samples were split into equal parts for analysis
on the GC/MS and LC/MS platforms.
GC/MS
The samples destined for GC/MS analysis were re-dried under vacuum desiccation for a
minimum of 24 hours prior to being derivatized under dried nitrogen using
bistrimethyl-silyltriflouroacetamide (BSTFA). The GC column was 5% phenyl and the temperature ramp
is from 40° to 300° C in a 16 minute period. Samples were analyzed on a
Thermo-Finnigan Trace DSQ fast-scanning single-quadrupole mass spectrometer using electron impact
ionization.
LC/MS
The LC/MS portion of the platform was based on a Waters ACQUITY UPLC and a
Thermo-Finnigan LTQ mass spectrometer, which consisted of an electrospray ionization (ESI) source
and linear ion-trap (LIT) mass analyzer. The sample extract was split into two aliquots, dried, then
reconstituted in acidic or basic LC-compatible solvents, each of which contained 11 or more
injection standards at fixed concentrations. One aliquot was analyzed using acidic positive ion
optimized conditions and the other using basic negative ion optimized conditions in two independent
injections using separate dedicated columns. Extracts reconstituted in acidic conditions were
gradient eluted using water and methanol both containing 0.1% Formic acid, while the basic extracts,
which also used water/methanol, contained 6.5 mM Ammonium Bicarbonate. The MS analysis alternated
between MS and data-dependent MS2 scans using dynamic exclusion.
Data extraction and compound identification
The data extraction of the raw mass spec data files yielded information that could be
loaded into a relational database and manipulated without resorting to BLOB manipulation. Once in
the database the information was examined and appropriate QC limits were imposed. Peaks were
identified using Metabolon’s proprietary peak integration software, and component parts were
stored in a separate and specifically designed complex data structure. TCA cycle intermediates were
identified by comparison to library entries of purified standards. The combination of
chromatographic properties and mass spectra gave an indication of a match to the specific compound
or an isobaric entity. The collected metabolite data is presented in Supplementary Table 16.
Authors: Tzuling Cheng; Jessica Sudderth; Chendong Yang; Andrew R Mullen; Eunsook S Jin; José M Matés; Ralph J DeBerardinis Journal: Proc Natl Acad Sci U S A Date: 2011-05-09 Impact factor: 11.205
Authors: Heather R Christofk; Matthew G Vander Heiden; Marian H Harris; Arvind Ramanathan; Robert E Gerszten; Ru Wei; Mark D Fleming; Stuart L Schreiber; Lewis C Cantley Journal: Nature Date: 2008-03-13 Impact factor: 49.962
Authors: Pedro Romero; Jonathan Wagg; Michelle L Green; Dale Kaiser; Markus Krummenacker; Peter D Karp Journal: Genome Biol Date: 2004-12-22 Impact factor: 13.583
Authors: Dimitrios Anastasiou; Yimin Yu; William J Israelsen; Jian-Kang Jiang; Matthew B Boxer; Bum Soo Hong; Wolfram Tempel; Svetoslav Dimov; Min Shen; Abhishek Jha; Hua Yang; Katherine R Mattaini; Christian M Metallo; Brian P Fiske; Kevin D Courtney; Scott Malstrom; Tahsin M Khan; Charles Kung; Amanda P Skoumbourdis; Henrike Veith; Noel Southall; Martin J Walsh; Kyle R Brimacombe; William Leister; Sophia Y Lunt; Zachary R Johnson; Katharine E Yen; Kaiko Kunii; Shawn M Davidson; Heather R Christofk; Christopher P Austin; James Inglese; Marian H Harris; John M Asara; Gregory Stephanopoulos; Francesco G Salituro; Shengfang Jin; Lenny Dang; Douglas S Auld; Hee-Won Park; Lewis C Cantley; Craig J Thomas; Matthew G Vander Heiden Journal: Nat Chem Biol Date: 2012-10 Impact factor: 15.040
Authors: Rachel S Kelly; Matthew G Vander Heiden; Edward Giovannucci; Lorelei A Mucci Journal: Cancer Epidemiol Biomarkers Prev Date: 2016-04-06 Impact factor: 4.254