Literature DB >> 23749957

Differential network analysis for the identification of condition-specific pathway activity and regulation.

Gennaro Gambardella¹, Maria Nicoletta Moretti, Rossella de Cegli, Luca Cardone, Adriano Peron, Diego di Bernardo.

Abstract

MOTIVATION: Identification of differential expressed genes has led to countless new discoveries. However, differentially expressed genes are only a proxy for finding dysregulated pathways. The problem is to identify how the network of regulatory and physical interactions rewires in different conditions or in disease.
RESULTS: We developed a procedure named DINA (DIfferential Network Analysis), which is able to identify set of genes, whose co-regulation is condition-specific, starting from a collection of condition-specific gene expression profiles. DINA is also able to predict which transcription factors (TFs) may be responsible for the pathway condition-specific co-regulation. We derived 30 tissue-specific gene networks in human and identified several metabolic pathways as the most differentially regulated across the tissues. We correctly identified TFs such as Nuclear Receptors as their main regulators and demonstrated that a gene with unknown function (YEATS2) acts as a negative regulator of hepatocyte metabolism. Finally, we showed that DINA can be used to make hypotheses on dysregulated pathways during disease progression. By analyzing gene expression profiles across primary and transformed hepatocytes, DINA identified hepatocarcinoma-specific metabolic and transcriptional pathway dysregulation. AVAILABILITY: We implemented an on-line web-tool http://dina.tigem.it enabling the user to apply DINA to identify tissue-specific pathways or gene signatures. CONTACT: dibernardo@tigem.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23749957 PMCID： PMC3702259 DOI： 10.1093/bioinformatics/btt290

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Gene Expression Profiles (GEPs), measured in different conditions and cell types via microarrays or, more recently, next generation sequencing, have been extensively used in computational systems biology to reverse engineer gene regulatory networks (Bansal ; Marbach ). The main use of reverse engineering has been the identification of unknown regulatory or functional interactions among genes, microRNAs and proteins from large datasets (Belcastro ; Sumazin ). State-of-the-art reverse-engineering methods model gene networks as static processes, i.e. regulatory interactions among genes in the network (such as direct physical interactions or indirect functional interactions) do not change across different conditions or tissue types. However, different cell types, or the same cell type but in different conditions, may carry out different functions, and it is expected that their regulatory networks reflect these differences. Several methods have been proposed to identify active subnetworks across different conditions from changes in gene expression. One of the first attempts is a general method to search for ‘active sub-networks’ connecting genes with unexpectedly high levels of Differential Expression (Ideker ). This method requires in input a single network, and it identifies a set of genes (i.e. subnetwork) whose expression changes across two conditions. However, changes in expression may be mild or absent, even when the subnetwork is active. Hence, looking only at the differential expression levels of genes could be not sufficient. Therefore, more recent approaches attempted to identify Differential Co-regulation (DC) of genes in the subnetwork (Choi ; Kostka ; Langfelder ; Leonardson ; Ma ; Odibat ; Reverter ; Watson ; Yinglei ). By differentially co-regulated (DC) genes, we mean set of genes, which are co-regulated only in a specific condition but not in others (de la Fuente, 2010; Ideker ). The main differences among all of these approaches are in how the genes to be tested are selected, how co-regulation is measured (i.e. Pearson Correlation Coefficient or Mutual Information) and how DC across the conditions is quantified. Some of the most advanced methods go beyond pair-wise co-regulation and aim at automatically identifying de-novo subnetwork(s) containing genes whose co-regulation changes the most across two or more conditions (Langfelder ; Ma ; Odibat ). This is achieved by advanced optimization techniques such as genetic algorithms, which, however, are computationally intensive (Kostka ), as they require checking all of the possible subnetworks to identify the ones that are most dysregulated. Hence, these methods are limited in the number of different conditions that can be compared (Choi ; Langfelder ; Watson ), and they may require fine tuning of the algorithm parameters (Ma ). Here, we developed and applied a simple but powerful procedure named DINA (DIfferential Network Analysis), which differently from other methods does not aim at identifying de novo subnetworks of genes but rather at identifying whether a known pathway is differentially coregulated across a set of conditions. DINA is also able to predict which transcription factors (TFs) may be responsible for the condition-specific co-regulation. DINA requires in input a set of N networks and a set of M genes, and it aims to identify whether co-regulation among the M genes in the set changes significantly across the N different conditions. We first applied DINA to identify tissue-specific pathways, and their TF regulators, starting from a collection of ∼3000 GEPs across 30 different tissues. DINA correctly discovered that the amino acid and fatty acid metabolic pathways are specifically active in liver and kidney (Hakvoort ; Jagoe ), despite the level of expression of these enzymes being similar across the tissues. DINA correctly identified TFs of the nuclear receptor family as the main regulators of these pathways. DINA also revealed a novel candidate negative regulator of metabolic pathways, YEATS2, a gene not well characterized (Wang ). We experimentally verified its association to metabolic pathway regulation using starvation response in primary hepatocytes (Ding ). We then applied DINA to three different hepatocyte cell lines, from different stages of hepatocarcinoma (HCC). DINA correctly predicted that the main metabolic pathways, as well as the p53 transcriptional program, get severely dysregulated in HCC.

2 METHODS

2.1 Database of GEPs

We implemented a GEP database in the open source DataBase Management System PostgreSQL environment. We downloaded all the MicroArray Gene Expression Markup Language (Brazma ) annotation files present in the ArrayExpress repository and extracted the relevant information on the tissue or cell type for each experiment. We then re-annoted each experiment in a semi-automatic way using the tissue ontology eVOC (Hide ).

2.2 Reverse engineering of tissue-specific gene co-regulation networks

We classified 2930 microarrays (Affymetrix HG-U133A and HG-U133plus2) extracted from ArrayExpress in 30 different tissues. We normalized microarrays independently for each tissue using Robust Multichip Average as implemented in the R package Bioconductor (Rafael ) (Supplementary Material). We computed the Spearman Correlation Coefficient (SCC) (Hardin ) for each pair of probes in each tissue, obtaining a correlation matrix of dimension 22 215 × 22 215 (we excluded control probes) for each tissue. We estimated the SCC significance for each pair of probes by computing t statistics of each SCC value and then using a Student's t-test distribution to estimate the P-value. To control the number of False Positives owing to the multiple hypotheses test problem, we estimated the degrees of freedom of the t-test distribution from the data by fitting the parameters of a Student’s t-location-scale distribution to the t statistics computed for all the probe pairs. We estimated the parameters by minimizing the squared error between the theoretical and the empirical distribution (Supplementary Material). In the construction of the SCC matrices, we did not apply any pre-filtering step to exclude low-variance probe-sets; however, we applied a stringent threshold (corrected P ) to call a SCC value significant, thus reducing the number of False Positive co-regulatory interactions. To obtain gene-wise SCC matrices starting from the probe-wise SCC matrices, we first excluded probes that were associated to more than one gene using the Affymetrix platform HG-U133A Ballester , but keeping genes associated to more than one probe. Specifically, we mapped 12 161 genes from the probes in the HG-U133A Affymetrix platform (Ballester ). Of these 12 161 genes, 68% of the genes were associated to only one probe, and only 11% of genes were associated to >2 probes (Supplementary Fig. S1). Hence, for the same pair of genes, there could be multiple values of the SCC because the same gene can be associated to multiple probes in the microarray. In this case, we chose to assign to the gene-pair, the ‘signed’ maximal absolute value of SCC across all the different probe pairs. At the end of the procedure, we thus derived 30 gene-wise networks from the 30 probe-wise networks. An alternative way to transform the probe-wise SCC matrices to gene-wise SCC matrices would have been to apply a ‘gene centered’ normalization (Ferrari ) of the microarrays using a custom CDF, before the SCC computation, thus eliminating the problem of multiple SCC values. We however decided to preserve information on possible alternative transcripts for future work and for experimental validation. To demonstrate that our probe-wise to gene-wise network transformation was robust and comparable with the custom CDF approach, we chose as a case of study the three cancer cell line networks presented in the Section 3.8. We applied a ‘gene centered normalization’ method recently proposed (Ferrari ). Finally, we measured the similarity between the ‘gene centered’ SCC matrices and the one we obtained with our strategy by computing the 2D correlation (Supplementary Table S1). The results show that the two approaches yield similar SCC values.

2.3 DINA

DINA requires in input a known set of M genes (i.e. genes belonging to a known pathway, or known targets of a TF), and it aims at assessing whether the co-regulation among the genes in the set changes significantly across N networks. We downloaded the full manually curated list of 186 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways from MsigDb (Liberzon ). We selected only those pathways consisting of genes that were present in our gene networks (i.e. at least 80% of the genes had to be present) obtaining a final list of 110 KEGG pathways (Supplementary Table S2). DINA uses an entropy like measure to identify in which tissue(s) a pathway is active. In information theory, entropy is a measure of the uncertainty associated to a random variable. If V is a discrete random variable, then the entropy H(V) can be computed as: Hence, the entropy H reaches its maximum value when each event is equi-probable and its minimum, i.e. , when there is no uncertainty. In our settings, V assumes N categorical values, representing the N condition-specific networks. To compute , we first computed the number of edges n connecting the M genes in the i network (adding a pseudo-count of 1) for , and we then computed can be interpreted as a probability because it is a number greater than 0 and it sums to 1 across all the N condition-specific networks by definition. will be equal to 1 only when the genes in the pathway are specifically co-regulated (i.e. connected) in network i and not co-regulated (connected) in any other network. Thus, P(V = i) represents the probability that M genes in a pathway are co-regulated only in the i network and not in the other networks. We also developed and tested a slightly modified version of the entropy H(V), which also takes into account the pathway topology (i.e. which genes are connected to which in the known pathway) (Sales ) as described in Supplementary Material. However, in the following analyses, we always used the H(V) formula previously described. We applied a permutation test in order to assess the entropy significance for each one of the KEGG pathways. The null distribution of H(V) was approximated by selecting a set of N random networks, with the same density as the original networks, and a set of M random genes. Random networks were obtained from the original network by randomly shuffling the gene labels. This procedure was repeated 10 000 times to estimate the H(V) P-value for each pathway. The P-value was then corrected using the Benjamini–Hochberg method (Benjamini ).

2.4 Identification of transcriptional regulators of tissue-specific pathways

We selected 1358 TFs including putative TFs (Supplementary Table S3) from a list of 1988 human DNA-binding TFs compiled using information from public repositories (Ravasi ). We mapped the TFs onto the HG-133A Affymetrix platform using only the probe sets associated to a single gene. For each TF, we computed the number of edges connecting it to the genes in the pathway of interest in each of the 30 Tissue Specific Co-regulatory Networks. We then assigned a P-value to each TF using the non-parametric Fisher’s exact test, by comparing, in each tissue, the number of edges between the TF and the genes in the pathway with the number of all the possible edges between the TF and the genes minus the real number of edges. The P-value was corrected using the Benjamini–Hochberg correction (Benjamini ).

2.5 Animals and tissues

Primary cultures of mouse hepatocytes were obtained from two wild-type mice using the Pichard’s protocol (Pichard ). Following collagenase perfusion and sedimentation, primary hepatocytes were washed and seeded at a density of 105 per well in gelatin-coated dishes (0.1% Gelatin Type I from porcine skin, Sigma). Cells were cultured in Williams E medium (Gibco, 12551) supplemented with 10% heat-inactivated fetal bovine serum (FBS, Invitrogen), 50 U Penicillin–Streptomycin 100× Solution (P/S Gibco 15140-122), 1% l-glutamine. After 5 h, the medium was replaced with Williams E medium supplemented with Hepatozyme-sfm medium (Gibco 17705-021) with 5% heat-inactivated FBS (Invitrogen), 50 U Penicillin—Streptomycin 100× Solution (P/S Gibco 15140-122), 1% l-glutamine and ITS 1× (Insulin Transferrin Selenium -Sciencell 0803). HeLa cells were seeded at a density of 105 per well in a 6 wells multi-well cultured in complete medium, Dulbecco’s modified Eagle’s medium (GIBCO BRL) supplemented with 10% heat-inactivated FBS (Invitrogen) and 1% antibiotic/antimycotic solution (GIBCO BRL) and 1% l-glutamine. Primary hepatocytes and HeLa cells were maintained at 37°C in a 5% CO2-humidified incubator over-night. After 24 h, the medium was removed and replaced with starvation medium, Hank’s Balanced Salt Solution (Gibco 14025-050) supplemented with HEPES solution 10 mM. The cells were collected at different times points from starvation (30 min, 1 h, 2 h, 4 h, 6 h and 8 h). The primary hepatocytes used as control were seeded at a density of 105 cultured in Williams E medium supplemented with Hepatozyme-sfm medium (Gibco 17705-021) with 5% heat-inactivated FBS (Invitrogen), 50 U Penicillin–Streptomycin 100× Solution (P/S Gibco 15140-122), 1% l-glutamine and ITS 1× (Insulin Transferrin Selenium -Sciencell 0803). HeLa cells used as control were seeded at a density of 105 and were cultured in complete medium, Dulbecco’s modified Eagle’s medium (GIBCO BRL) supplemented with 10% heat-inactivated FBS (Invitrogen)and 1% antibiotic/antimycotic solution (GIBCO BRL) and 1% l-glutamine. Animal use and analyses were conducted in accordance with the guidelines of the Animal Care and Use Committee of Cardarelli Hospital in Naples and authorized by the Italian Ministry of Health.

2.6 mRNA extraction and quantitative real-time PCR

The cells were collected at each time point, and mRNA was extracted and retro-transcribed using the RNeasy mini kit and the Quantitec reverse transcription kit (Qiagen), respectively. We performed a Quantitative Real Time PCR reaction, set up in duplicates using the LightCycler 480 SYBR green master mix (Roche), and the amplification was performed using a LightCycler 480 Real Time PCR instrument (Roche). The quantitative real-time PCR (qRT-PCRs) were carried out using different pairs of primers for human and mouse isoforms of Yeats2. Gapdh was used as control. The primer sequences for all genes are listed in Supplementary Table S4. Data analysis was performed using the LightCycler 480 Software(Roche). GAPDH mRNA levels were used to normalize the amount of mRNA, and ΔCts were calculated as the difference between the average GAPDH Ct and the average of Ct for each gene. To asses whether genes change their expression significantly in the qRT-PCR experiments, we used the software package Bayesian Analysis of Time Series (Angelini ). This method is based on a Bayesian Approach to automatically identify and rank differentially expressed genes from time-series data according to a Bayes Factor, where a Bayes Factor less than one means that the gene is differential express respect to the control.

3 RESULTS

3.1 Construction of a semantic database for tissue-specific GEPs

One of the main hurdles in using GEPs from public repositories is the abysmal state of the experiments’ meta-data containing information on the biological samples and experimental protocols. To select tissue-specific GEPs, we built a semantic database, structured according to a human tissue ontology (Hide ), to retrieve and classify, in a semi-automatic fashion, microarrays from ArrayExpress (Parkinson ). We assigned to each GEP the correct tissue, according to the available meta-data, and kept only GEPs with a reliable annotation. We were thus able to collect 2930 high-quality GEPs (Affymetrix HG-U113A and HG-U133 Plus 2.0 platforms) for 30 different tissues (Section 2 and Supplementary Tables S5 and S6).

3.2 Reverse engineering of tissue-specific gene co-regulation networks

Although Mutual Information has been shown to be a better alternative to correlation in identifying co-regulated genes (Basso ; Belcastro ), we decided to use the SCC owing to the limited number of GEPs available in the different tissues. We also decided not to use network pruning techniques (Faith ; Margolin ; Soranzo ), as we were not interested to distinguish between direct and indirect interactions, but rather in how co-regulation among genes changes across the different tissues. We first normalized GEPs within each tissue (Rafael ); we then computed the SSC (Hardin ) for each pair of probe-sets, retaining only those ones with a significant SCC (Section 2). From these 30 probe-set-wise networks, we built 30 gene-wise networks, by assigning to each probe-set the corresponding gene (Section 2).

3.3 Validation and analysis of the tissue-specific gene networks

To verify the biological relevance of the tissue-specific gene networks, we constructed two ‘Golden Standards’: (i) experimentally verified protein–protein interactions (Bossi ; Xuebing ) and (ii) the manually curated Reactome database of genes and proteins that participate to the same pathways (Croft ). For each network, we computed the percentage of co-regulated genes for which a regulatory interaction was confirmed by one of the two Golden standards [Positive Predictive Value = TP/(TP + FP)]. As shown in Supplementary Figures S2 and S3, all of the networks have a PPV significantly higher than what would be expected by chance (Section 2). As each network was constructed using a different number of GEPs, we also verified that the difference in performance across the networks was not related to the number of experiments used for the construction of each network (Supplementary Fig. S4). As a further proof of the biological relevance of the tissue-specific co-regulation networks, we identified which interactions were conserved across the majority of the 30 networks (Supplementary Table S7): 3235 co-regulatory interactions, involving 993 distinct genes, were conserved in at least half of the tissue-specific networks. Gene Ontology Enrichment Analysis of these genes revealed an enrichment for ‘housekeeping’ functions such as ribosomal and cell cycle genes (Supplementary Fig. S5).

3.4 DINA for the identification of condition-specific pathways

Our working hypothesis is that genes belonging to a condition-specific pathway are actively co-regulated only in specific conditions when the pathway is active, but not in others, independently of their absolute level of expression. To this end, we developed a network-based algorithm, DINA, which is able to identify whether genes in a known pathway are significantly co-regulated only in specific conditions, but not in others (Fig. 1A and Supplementary Fig. S6). The algorithm starts with a set of M genes (i.e. genes belonging to a known pathway) and a set of N networks (i.e. the 30 tissue-specific gene networks). It then computes a ‘co-regulation probability’ for the M genes in each of the N networks; this probability is proportional to the number of edges among the genes in each network. DINA then quantifies how variable the co-regulation probability is across the N networks. Variability is quantified using an entropy-based measure (H), and its significance is estimated using a permutation test (Section 2). If the M genes in the set have a similar co-regulation probability across the N networks, then the entropy H will be high; on the other hand, if the M genes have a high co-regulation probability only in one (or few) networks (i.e. the pathway activity is condition-specific) than the entropy H will be low (hence, we are interested in pathways associated to a low H).

Fig. 1.

Differential network analysis. (A) Graphic description of the DINA method to quantify the variability of co-regulation among the genes in a pathway across multiple networks. (B) Graphic description of the method used to identify the transcriptional regulators of the genes in a pathway across multiple networks

3.5 Identification of tissue-specific pathways using DINA

To test whether DINA was able to identify tissue-specific pathways, i.e. pathways that are actively regulated only in specific tissues, we used the full manually curated list of 186 KEGG pathways from MsigDb (Kanehisa ; Liberzon ) including signaling, metabolic and regulatory pathways. A pathway in KEGG is a set of genes known to function as a module according to the literature. From this list, we deleted disease pathways and pathways not well represented in our networks (Section 2), thus obtaining a final list of 110 pathways. By applying DINA to the tissue-specific networks, we obtained 22 significant pathways (with corrected P ≤ 0.01, Supplementary Table S8). One of the most significant pathways (i.e. the one with lowest entropy H) was the Glycine, Serine and Threonine metabolic pathway (KEGG hsa00260). This pathway was correctly identified by DINA to be mainly regulated in liver and kidney, where most of the glycine to serine metabolism occurs (van de Poll ). Interestingly, among the 22 significant pathways, nine are metabolic pathways enriched in liver and kidney (Supplementary Table S8 pathways in bold). Figure 2A shows the co-regulation probability of the 32 genes in the Glycine, Serine and Threonine metabolic pathway in each of the 30 tissues, as previously defined; for comparison, Figure 2B shows the average expression level of the genes in the pathway in each of the tissues. Expression levels do not change significantly across the tissues, whereas the co-regulation probabilities (Fig. 2A) are strikingly different.

Fig. 2.

Differential network analysis of the Glycine pathway (KEGG hsa00260). (A) Co-regulation probability of the 32 genes in the Glycine pathway (hsa00260) across the 30 tissues. (B) Average expression level of the 32 genes in the Glycine pathway (hsa00260) across the thirty tissues (error bars represent one standard deviation) We checked for the expression level of the genes encoding for the enzymes involved in this pathway also in the Gene Atlas Dataset (Su ), a compendium of normal tissues used to identify where genes are expressed. Using the canonical expression level threshold of 200 (Su ), we found that only 13 of 32 genes are expressed in liver, and only 2 of 32 are expressed in Kidney (Supplementary Fig. S7). Similar considerations applied to the other significant metabolic pathways identified by DINA (with corrected P ≤ 0.01, Supplementary Table S8 pathways in bold). Hence, an approach based on expression levels (and not co-regulation) would not have been able to identify these tissue-specific metabolic pathways (for the other significant metabolic pathways refer to Supplementary Figs. S8 and S9).

3.6 Identification of transcriptional regulators of tissue-specific pathways

We wondered whether it was possible to identify TFs regulating tissue-specific pathways identified by DINA. We reasoned that a TF, controlling a tissue-specific pathway, may be co-regulated with its target genes only in that tissue but not in others (Fig. 1B). As the regulation of metabolic pathways has been well studied in the past, we decided to identify TFs involved in the regulation of the nine metabolic pathways previously identified by DINA. To this end, we used a list of 1358 human genes including both genes, whose protein product has a verified TF activity (Ravasi ), as well as genes encoding proteins with an indirect transcriptional activity, such as co-factors or scaffolding proteins (Stella ). For each of the nine metabolic pathways previously identified as tissue-specific, and for each TF in the list, we applied the Fisher’s exact test (Section 2) to select TFs sharing a significant number of edges with the genes in the pathway only in the tissue(s) where the pathway is active, as shown in Figure 1B. The regulators identified for each pathway are reported in Supplementary Tables S9–S17 [together with the Benjamini–Hochberg corrected P-value (Benjamini )]. Table 1 lists the TFs controlling the majority (i.e. seven of nine) of the metabolic pathways according to our analysis (Section 2). Considering only genes encoding proteins with a known TF activity (Table 1 in bold), we correctly identified many nuclear receptors as specific regulators of these pathways (NR1H4, NR1I3, ESRRG, HNF4A). The nuclear receptor super-family is one of the largest group of TFs involved in the regulation of different metabolic processes (Francis ), such as the regulation of liver metabolism (Elfaki ).

Table 1.

Transcription Factors identification

Symbol	Name	Role	Citations
NR1H4	nuclear receptor subfamily 1, group H, member 4	activator	(Forman et al., 1995; Makishima et al., 2005; Vazquez, 2012)
ESRRG	estrogen-related receptor gamma	activator	(Makishima et al., 2005; Sanoudou et al., 2010)
TRPS1	trichorhinophalangeal syndrome I	inhibitor
NR1I3	nuclear receptor subfamily 1, group I, member 3	activator	(Bauer et al., 2004; Makishima et al., 2005; Miao et al., 2006)
HNF4A	hepatocyte nuclear factor 4, alpha	activator	(Makishima et al., 2005; Rommel et al., 2003)
ZNF394	zinc finger protein 394	inhibitor
TBR1	T-box, brain, 1	activator
DAB2	disabled homolog 2	activator
DIP2C	disco-interacting protein 2	activator
TRIM15	tripartite motif-containing 15	activator
ASB9	ankyrin repeat and SOCS box-containing 9	activator
YEATS2	YEATS domain containing 2	inhibitor
SIRT4	sirtuin 4	activator	(Chalkiadaki et al., 2012; Nargis et al., 2010; Nidhi et al., 2007)

Note: List of TFs regulating the majority (i.e. seven of nine) of the tissue-specific metabolic pathways. In bold genes with know TF activity, in normal text genes encoding protein indirectly involved in transcription.

Transcription Factors identification Note: List of TFs regulating the majority (i.e. seven of nine) of the tissue-specific metabolic pathways. In bold genes with know TF activity, in normal text genes encoding protein indirectly involved in transcription. For example, one of the six receptors is HNF4A, probably the most famous nuclear receptor in liver, whose mutations are responsible for monogenic autosomal dominant non-insulin-dependent diabetes mellitus type I (Desvergne ). The protein encoded by this gene controls the expression of several genes, including hepatocyte nuclear factor 1 alpha, a TF that regulates the expression of several hepatic genes. When we considered also genes encoding proteins indirectly involved in transcription (Ravasi ) (Table 1 not in bold), we identified, among others, SIRT4 (sirtuin 4), a member of the sirtuins’family that plays a key role in metabolic response (Chalkiadaki ; Nargis ; Nidhi ).

3.7 YEATS2: a negative transcriptional regulator of metabolic pathways

YEATS2 was predicted to be the most significant negative regulator shared by most of the metabolic pathways (Table 1). YEATS2 is expressed at low levels in both liver and kidney (Wu ), and little is known about its function. Recently, it has been demonstrated that YEATS2 interacts with the Ada-Two-A-Containing complex (Wang ), which, together with Spt-Ada-Gcn5-Acetyl-Transferase, is able to modulate transcription, both by causing chromatin modification and by interacting with the TATA-binding proteins (Krebs ; Wang ). To validate our prediction about the involvement of YEATS2 in the transcriptional regulation of metabolism in liver, we decided to further investigate its function by perturbing hepatocytes homeostasis by starvation (Ding ). During starvation, a switch from anabolism to catabolism occurs (Caro-Maldonado ): cells start to mobilize stored nutrients, such as glycogen and triglycerides, cell growth is arrested and autophagy is promoted (Ding ; Levine ). During starvation, there are large changes in gene expression that affect specific metabolic pathways. For example, genes involved in fatty acid β-oxidation are upregulated (Bauer ), whereas genes involved in biosynthesis are downregulated (Sokolovic ). We performed a starvation time-course experiment for 8 h in both primary murine hepatocytes and in HeLa cells, by switching cells from a nutrient-rich medium to a starvation medium (Section 2). Cells were collected at different time points during starvation (30 min, 1 h, 2 h, 4 h, 6 h, 8 h). Cells grown in nutrient-rich medium were used as control. We measured by qRT-PCR the variation in the expression level of YEATS2 in response to starvation in primary hepatocytes and HeLa cells (Fig. 3 and Supplementary Fig. S10). Yeats2 is an early response gene, quickly downregulated on starvation during the first 2 h in primary hepatocytes, as shown in Figure 3.

Fig. 3.

Yeats2 expression in hepatocyte cells during starvation. Real-time quantitative PCR measurements of the expression of Yeats2 and a set of marker genes at the indicated time-points following starvation. CRT indicates cell in rich medium. BF indicated the Bayes Factor estimated using Bayesian Analysis of Time Series algorithm. The gray area represents the standard deviation across the two biological replicates. Gene expression was quantified using the ΔCT method with Gapdh used as normalization gene We also analyzed the expression profiles of a subset of genes whose expression levels increase following starvation (Bauer ; Yoon ): Pgc1a, Acaa1a, Acot2, Cyp4a10, Cyp4a14 and ApoA4 (Fig. 3). Moreover, we measured the expression profiles of CRAT, CTSB and PLIN2 in HeLa cells as shown in Supplementary Fig. S10. These selected genes were up-regulated, as expected, during the first 4 h of starvation, as shown in Figure 3: Pgc1a (Peroxisome proliferator-activated receptor gamma, co-activator 1 alpha) encoding for a transcriptional co-activators that plays a key role in the regulation of both carbohydrate and lipid metabolism (Leone ); Acaa1a (Acetyl-CoA acyltransferase 1A) encoding a peroxisomal thiolase operating in catabolism of fatty acid (Bauer ) together with ACOT2 (Acyl-CoA thioesterase 2), which is localized in peroxisomes(Hunt ); Cyp4a10 (Cytochrome P450, family 4, subfamily a, polypeptide 10) and Cyp4a14 (Cytochrome P450, family 4, subfamily a, polypeptide 14) encoding two members of Cytochrome P family able to oxidize a variety of structural compounds, as well as fatty acids (Bauer ; van den Bosch ). Genes involved in lipid transport showed an upregulation as well, such as ApoA4 (Apolipoprotein A4), which enhances lipid absorption by promoting the assembly and secretion of Chylomicrons (Bauer ; Yao ). To probe further the role of Yeats2 and its involvement in regulation of metabolism in liver, we analyzed an existing in vivo time-series microarray experiment (ArrayExpress ID E-MEXP-748) from liver, muscle and adipose tissue of ApoE3Leiden transgenic mice, exhibiting a humanized lipid metabolism, treated with high-fat diet (HFD) for 0, 1, 6, 9, or 12 weeks (Kleemann ). On HFD feeding, genes involved in metabolic pathways, such as lipid metabolic processes, were found to be upregulated in liver (Kleemann ). Based on these observations, we decided to investigate the expression of Yeats2 in this mouse model considering only the liver tissue, and we found that Yeats2 expression is strongly downregulated in HFD mouse liver (P-value of ) (Kapushesky ; Kleemann ). Our results support a previously unreported role of the scaffolding protein YEATS2 in transcriptional control of the metabolic response and demonstrate that DINA can be applied to identify regulators of tissue-specific pathways.

3.8 Identification of disease-specific pathways dysregulation using DINA

Gene expression alteration is a common molecular hallmark of cancer progression. The identification of cancer genetic signatures has been successfully exploited for understanding the mechanisms of cancer development (Watters , as well as for anticancer therapies selection (Rothenberg ) and diseases prognosis (Dracopoli ). Moreover, specific cancer-regulated gene networks have been identified (Mani ; Stella ). We wondered whether DINA could be successfully used to identify selective alterations of co-regulated gene networks in cancer. As a study model, we focused on hepatocellular carcinoma, as several cell-lines modeling HCC progression are available, as well as GEPs measured in these cell lines. HCC progression involves alterations in many signaling pathways, such as EGF-Ras-MAPK, AKT-mTOR, Jak-Stat and NF-kB cascades (Llovet ). In addition, inactivating mutations of the tumor suppressor p53, or p53 loss of expression, are among the most frequent genetic events associated with hepatocyte transformation (Bressac ; Hinds ), and the dysregulation of p53-dependent genes have been observed in HCC (Hailfinger ; Hsu ). Here, we collected 161 GEPs (Supplementary Table S18) for three human cell-lines: primary hepatocytes (40 GEPs), hepatoblastoma-derived cell line HepG2 (39 GEPs) and Hepatocarcinona-derived cell line (Huh7) (82 GEPs). We first tested the ability of DINA to identify DC of p53-dependent genes across the three cell lines. To this purpose, we built a gene signature made up by 34 experimentally validated direct transcriptional targets of p53 (Lim ), and we then applied DINA to this gene signature, as shown in Figure 4.

Fig. 4.

Differential Network Analysis of the p53 gene signature in primary and transformed heptocytes. The gene signature consists of 34 bona fide transcriptional targets of p53. (A) p53 expression level in the three cell-lines for the two probes present in Affy HG-U133A platform. (B) Comparison between the co-regulation probability of the genes in the signature (black) and their average expression level DINA successfully detected a DC of the p53 target genes across the three cell lines: the co-regulation probability is high in normal hepatocytes and to a lesser extent in hepatocellular carcinoma HEPG2 cell line, carrying a wild-type p53 protein, and decreases significantly in Huh7 cell line, carrying an inactive p53 protein (Bressac ) (Fig. 4B). Interestingly, the expression level of the p53-target genes did not correlate with the functional status of the p53 protein in the different cell lines, thus supporting our previous observation (Fig. 4A) that an expression-based method would be less powerful than the DINA in identifying dysregulated pathways. We next applied DINA to identify dysregulated pathways during hepatocytes transformation. The DINA-based analysis of the 110 KEGG pathways identified at least four pathways whose co-regulation is significantly disrupted in the HCC cell lines compared with the normal hepatocytes (Supplementary Table S19). Similarly to the previous results, the average expression levels of the genes in these pathways did not change significantly between normal and transformed hepatocytes. Interestingly, the most significant loss of co-regulation observed in transformed hepatocytes involves the peroxisome metabolism (KEGG ko04146), the primary bile acid biosynthesis (map00120) and the glyoxylate and dicarboxylate metabolism (map00630): these pathways are responsible for fundamental functions in liver cells such as the synthesis of bile acids, cholesterol, the oxidation of fatty acid, the metabolism of phenyalanine, the glyoxylate and the tyrosine metabolism. Moreover, among the other dysregulated pathways identified by DINA, we found disruption of fundamental pathways regulating liver cancer progression such as the PPAR signaling pathway (Supplementary Table S19). To gain further insights into the dysregualtion of the peroxisome metabolism, we analyzed the changes in the gene co-regulation network among the corresponding genes across the three cell lines. Figure 5A and B demonstrates that there is a major loss of co-regulation among peroxisome-related genes in both HepG2 and Huh7 HCC cell lines; moreover, this loss mainly results from dysregulation of genes involved in peroxisomal fatty acid β-oxidation (e.g. ACOX, EHHADH, ACAA1) and genes involved in the control of the H202 metabolism (e.g. CAT and SOD). Notably, these genes are regulated by the peroxisome proliferator-activated receptor alpha (PPARalpha) (Reddy ) and the LXR family TFs (Hu ).

Fig. 5.

Differential Network Analysis of the peroxisome KEGG pathway (M6391) in primary and transformed hepatocytes. Genes in the peroxisome pathway are represented as circles; a significant co-regulation between two genes as a line. The size of the circles is proportional to the difference in the number of edges between the network in transformed hepatocytes versus primary hepatocytes. Gray lines represent edges lost in the network compared with primary cells. (A) HepG2 versus primary hepatocyte; (B) Huh7 versus primary hepatocyte Thus, our results indicate that the dysregulation in the activity of these liver-specific transcription regulators may represent a recurrent event associated with HCC. Consistent with these results, peroxisome and PPARalpha pathway alterations have been definitely associated with liver cell proliferation and with HCC development (Gonzalez , confirming the efficacy and specificity of DINA algorithm in identifying condition-specific pathway regulation.

4 DISCUSSION AND CONCLUSION

In this study, we hypothesized that genes belonging to a tissue-specific pathway are actively co-regulated, and hence co-regulated, only in specific tissues where the pathway is active, but not in others, independently of their absolute level of expression. We proposed an approach (DINA) based on quantifying the variability in the co-regulation probability of genes across tissues or conditions. DINA is based on detecting differences in the number of edges among genes in a pathway across a set of networks, and, therefore, it can be applied to any kind of network, independently of how this is generated. DINA, however, is not able to detect distinct network topologies that have equal density. Differently from other methods, DINA does not aim at identifying de novo subnetworks of genes, but rather at identifying whether a known pathway (or a set of genes of interest) is differentially coregulated across a set of conditions. We derived 30 tissue-specific gene co-regulation networks and identified several metabolic pathways as the most differentially regulated across the tissues, and specifically active in liver and kidney. Usually, tissue specificity of a gene, or of a pathway, is assessed by quantifying the expression level of the genes in the concerned tissue (Shlomi ). However, observing gene expression only could be not sufficient, as in the case of metabolic pathways (Gille ). Here, we show that an alternative possibility is to check whether the genes involved in the same pathway are specifically co-regulated in the concerned tissue. Of note, a similar approach has been successfully applied in yeast (Kharchenko ). We also demonstrated that tissue-specific targets of a TF tend to be co-regulated with the TF in a tissue-specific manner. Hence, we developed a new method based on the Fisher’s exact test to identify tissue specific TFs. We tested this approach to identify regulators of tissue-specific metabolic pathways and correctly identified Nuclear Receptors as their main regulators. We were also able to identify a new putative tissue-specific negative regulator of heptocyte metabolism (Yeats2). Finally, we showed that DINA can be used to analyze GEPs obtained during disease progression to make hypotheses on dysregulated pathways. The identification of differential expressed genes in disease compared with normal conditions is a standard practice in laboratories all over the world, and it has led to countless new discoveries. However, differentially expressed genes are only a proxy for finding dysregulated pathways. Indeed, the real question one would like to answer is which pathways get dysregulated during disease progression, to understand the pathogenic mechanisms. Recent efforts have shown that using high-throughput phospho-proteomics in conjunction with signaling network models can be used to identify differences in signaling pathways between normal and transformed hepatocytes (Saez-Rodriguez ). Here, we demonstrated that DINA is able to gain information about HCC-specific metabolic and transcriptional pathway dysregulation by quantifying changes in co-regulation among genes across primary and transformed hepatocytes. It remains to be seen whether changes in signaling pathway activity can be detected using only a transcription-based approach such as DINA. We also implemented an on-line web tool (http://dina.tigem.it) enabling the user to apply DINA to identify tissue-specific pathways or gene signatures.

84 in total

1. Finding disease specific alterations in the co-expression of genes.

Authors: Dennis Kostka; Rainer Spang
Journal: Bioinformatics Date: 2004-08-04 Impact factor: 6.937

Review 2. Developing gene expression signatures of pathway deregulation in tumors.

Authors: James W Watters; Christopher J Roberts
Journal: Mol Cancer Ther Date: 2006-10 Impact factor: 6.261

3. Reactome: a database of reactions, pathways and biological processes.

Authors: David Croft; Gavin O'Kelly; Guanming Wu; Robin Haw; Marc Gillespie; Lisa Matthews; Michael Caudy; Phani Garapati; Gopal Gopinath; Bijay Jassal; Steven Jupe; Irina Kalatskaya; Shahana Mahajan; Bruce May; Nelson Ndegwa; Esther Schmidt; Veronica Shamovsky; Christina Yung; Ewan Birney; Henning Hermjakob; Peter D'Eustachio; Lincoln Stein
Journal: Nucleic Acids Res Date: 2010-11-09 Impact factor: 16.971

4. Functional inhibitory cross-talk between constitutive androstane receptor and hepatic nuclear factor-4 in hepatic lipid/glucose metabolism is mediated by competition for binding to the DR1 motif and to the common coactivators, GRIP-1 and PGC-1alpha.

Authors: Ji Miao; Sungsoon Fang; Yangjin Bae; Jongsook Kim Kemper
Journal: J Biol Chem Date: 2006-02-21 Impact factor: 5.157

5. The orphan nuclear receptor HNF4alpha determines PXR- and CAR-mediated xenobiotic induction of CYP3A4.

Authors: Rommel G Tirona; Wooin Lee; Brenda F Leake; Lu-Bin Lan; Cynthia Brimer Cline; Vishal Lamba; Fereshteh Parviz; Stephen A Duncan; Yusuke Inoue; Frank J Gonzalez; Erin G Schuetz; Richard B Kim
Journal: Nat Med Date: 2003-01-06 Impact factor: 53.440

Review 6. Nuclear receptors and the control of metabolism.

Authors: Gordon A Francis; Elisabeth Fayard; Frédéric Picard; Johan Auwerx
Journal: Annu Rev Physiol Date: 2002-05-01 Impact factor: 19.318

7. graphite - a Bioconductor package to convert pathway topology to gene network.

Authors: Gabriele Sales; Enrica Calura; Duccio Cavalieri; Chiara Romualdi
Journal: BMC Bioinformatics Date: 2012-01-31 Impact factor: 3.169

8. Molecular Mechanisms Underlying the Link between Nuclear Receptor Function and Cholesterol Gallstone Formation.

Authors: Mary Carmen Vázquez; Attilio Rigotti; Silvana Zanlungo
Journal: J Lipids Date: 2011-11-01

9. CoXpress: differential co-expression in gene expression data.

Authors: Michael Watson
Journal: BMC Bioinformatics Date: 2006-11-20 Impact factor: 3.169

10. Statistical methods for gene set co-expression analysis.

Authors: YounJeong Choi; Christina Kendziorski
Journal: Bioinformatics Date: 2009-08-18 Impact factor: 6.937

27 in total

Review 1. Differential gene regulatory networks in development and disease.

Authors: Arun J Singh; Stephen A Ramsey; Theresa M Filtz; Chrissa Kioussi
Journal: Cell Mol Life Sci Date: 2017-10-10 Impact factor: 9.261

2. ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion.

Authors: Yang Zhang; Z Lewis Liu; Mingzhou Song
Journal: Nucleic Acids Res Date: 2015-04-20 Impact factor: 16.971

3. The DifferentialNet database of differential protein-protein interactions in human tissues.

Authors: Omer Basha; Rotem Shpringer; Chanan M Argov; Esti Yeger-Lotem
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

4. SpliceNet: recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples.

Authors: Hari Krishna Yalamanchili; Zhaoyuan Li; Panwen Wang; Maria P Wong; Jianfeng Yao; Junwen Wang
Journal: Nucleic Acids Res Date: 2014-07-17 Impact factor: 16.971

5. Differential network analysis reveals dysfunctional regulatory networks in gastric carcinogenesis.

Authors: Mu-Shui Cao; Bing-Ya Liu; Wen-Tao Dai; Wei-Xin Zhou; Yi-Xue Li; Yuan-Yuan Li
Journal: Am J Cancer Res Date: 2015-08-15 Impact factor: 6.166

6. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data.

Authors: Jiadong Ji; Di He; Yang Feng; Yong He; Fuzhong Xue; Lei Xie
Journal: Bioinformatics Date: 2017-10-01 Impact factor: 6.937

7. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks.

Authors: Y X Rachel Wang; Lexin Li; Jingyi Jessica Li; Haiyan Huang
Journal: Stat Sci Date: 2021-02 Impact factor: 2.901

8. Fibroblast growth factor signalling controls nervous system patterning and pigment cell formation in Ciona intestinalis.

Authors: Claudia Racioppi; Ashwani K Kamal; Florian Razy-Krajka; Gennaro Gambardella; Laura Zanetti; Diego di Bernardo; Remo Sanges; Lionel A Christiaen; Filomena Ristoratore
Journal: Nat Commun Date: 2014-09-05 Impact factor: 17.694

9. Global gene expression profiling reveals a suppressed immune response pathway associated with 3q amplification in squamous carcinoma of the lung.

Authors: Jun Qian; Yong Zou; Jing Wang; Bing Zhang; Pierre P Massion
Journal: Genom Data Date: 2015-06-14

10. A reverse-engineering approach to dissect post-translational modulators of transcription factor's activity from transcriptional data.

Authors: Gennaro Gambardella; Ivana Peluso; Sandro Montefusco; Mukesh Bansal; Diego L Medina; Neil Lawrence; Diego di Bernardo
Journal: BMC Bioinformatics Date: 2015-09-03 Impact factor: 3.169