Literature DB >> 28163897

Differentially correlated genes in co-expression networks control phenotype transitions.

Lina D Thomas¹, Dariia Vyshenska², Natalia Shulzhenko³, Anatoly Yambartsev¹, Andrey Morgun².

Abstract

BACKGROUND: Co-expression networks are a tool widely used for analysis of "Big Data" in biology that can range from transcriptomes to proteomes, metabolomes and more recently even microbiomes. Several methods were proposed to answer biological questions interrogating these networks. Differential co-expression analysis is a recent approach that measures how gene interactions change when a biological system transitions from one state to another. Although the importance of differentially co-expressed genes to identify dysregulated pathways has been noted, their role in gene regulation is not well studied. Herein we investigated differentially co-expressed genes in a relatively simple mono-causal process (B lymphocyte deficiency) and in a complex multi-causal system (cervical cancer).
METHODS: Co-expression networks of B cell deficiency (Control and BcKO) were reconstructed using Pearson correlation coefficient for two mus musculus datasets: B10.A strain (12 normal, 12 BcKO) and BALB/c strain (10 normal, 10 BcKO). Co-expression networks of cervical cancer (normal and cancer) were reconstructed using local partial correlation method for five datasets (total of 64 normal, 148 cancer). Differentially correlated pairs were identified along with the location of their genes in BcKO and in cancer networks. Minimum Shortest Path and Bi-partite Betweenness Centrality where statistically evaluated for differentially co-expressed genes in corresponding networks.
Results: We show that in B cell deficiency the differentially co-expressed genes are highly enriched with immunoglobulin genes (causal genes). In cancer we found that differentially co-expressed genes act as "bottlenecks" rather than causal drivers with most flows that come from the key driver genes to the peripheral genes passing through differentially co-expressed genes. Using in vitro knockdown experiments for two out of 14 differentially co-expressed genes found in cervical cancer (FGFR2 and CACYBP), we showed that they play regulatory roles in cancer cell growth.
CONCLUSION: Identifying differentially co-expressed genes in co-expression networks is an important tool in detecting regulatory genes involved in alterations of phenotype.

Entities: CellLine Chemical Disease Gene Species

Keywords: biological state transition; co-expression networks; differential co-expression analysis

Year: 2016 PMID： 28163897 PMCID： PMC5247791 DOI： 10.12688/f1000research.9708.1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

Recent technological advances have moved the focus of biologists from how to measure biological parameters to how to analyze and interpret tens of thousands of measurements, frequently called omics data. The first solutions for such a problem were limited to hierarchical clustering [1– 3] and simple comparisons between classes of data through the identification of differentially expressed genes (DEGs) [4, 5]. Nowadays, reconstruction and interrogation of biological networks have become a widely used approach to get insights from different types of omics data [6, 7]. After establishing co-expression networks for different states of one biological system, differential co-expression analysis investigates their structural changes when a system goes through a state transition. This analysis, first proposed more than a decade ago [8, 9], identifies the pairs of genes that have their interaction changed during such transition. Several later publications have suggested different algorithms and statistics to determine differential gene co-expression [10– 27]. Fewer studies, however, attempted to evaluate the biological significance of these changes [18, 21]. Also, to the best of our knowledge, there have been no studies that would investigate how this approach performs depending on the type and complexity of the biological system analyzed. Commonly, a state transition of a biological system is related to perturbation of a set of genes, which propagates through network interactions and affects other genes. Thus, there is a possibility that differentially co-expressed (DC) genes (directly or indirectly) contribute to the propagation of perturbations. In order to investigate the role of DC genes in a state transition of a biological system, we considered two biological processes [28, 29] previously analyzed by our group. The first one (B cell deficiency in mice) is a homogenous, one-causal-factor process, while the second one (cervical cancer) represents a heterogeneous multi-causal system. In this work, a co-expression network is an undirected graph, where the set of nodes consists of a set of DEGs, and a pair of nodes is connected if there is a significant correlation between them. Differential co-expression analysis is done by identifying the pairs of genes that suffer significant changes in correlation between two states. Throughout this paper such pairs are called differentially correlated pairs (DCPs) and the genes forming these pairs are considered DC genes.

Results

B cell deficiency

We started by analyzing the B cell knockout (BcKO) data [28], which represents a relatively simple experimental model with only one causal factor (B lymphocytes) and homogenous subject groups since this experiment was performed in highly inbred strains of mice. In order to select the nodes to reconstruct the co-expression networks (BcKO and Control) we compared gene expression in jejunum between BcKO and control mice and found 509 DEGs ( Dataset 1). Next, the edges for each network were determined using significantly correlated pairs of DEGs ( Figure 1). To identify DCPs we used the method introduced in [21] which compares correlations in the BcKO group and in the Control group. Eighty DCPs were found ( Dataset 2), of which 56 represent correlation gains (edges which were not present in Control network but showed up in BcKO) and 24 represent losses.

Figure 1.

Co-expression networks for BcKO data.

Contains p-values, ratios of expression means, combined Fisher’s p-value, fdr, direction of regulation, whether it is Ig gene and whether it is DC gene. Click here for additional data file. Contains information such as “change direction” (whether each pair gained or lost correlation/edge), “sign of local partial correlation” in BcKO data and control data, “regulation” (whether each gene of each pair is up- or down-regulated in BcKO), “number of Ig genes” in each pair. Click here for additional data file. Now we investigate whether network structural changes, herein represented by DCPs, are related to actual causes of global change in gene expression. In the previous study [28], it was shown that intestinal gene expression alterations in BcKO mice are mostly dependent on the ability of B lymphocytes to produce antibodies. Therefore, we analyzed the presence of immunoglobulin coding genes (Ig genes, see Dataset 3) among differentially expressed genes (26 Ig genes among 509 DEGs) in DCPs. We observed that 72% (39 out of 54) of correlation gain DCPs are formed by at least one Ig gene, ( Figure 2A). Moreover, we found strong enrichment for Ig genes among DC genes in correlation gain (24% (15 out of 63) of DC genes are Ig genes vs 2.7% (11 out of 415) of other DEGs are Ig genes), while no enrichment was observed for correlation lost as a result of B cell deficiency ( Figure 2B). Thus, these results support the idea that differentially expressed genes that acquire correlations during transition from one biological state to another have a high chance to play causal roles in such transition.

Figure 2.

A) 78 Differentially Correlated Pairs (DCPs) were found, of which 54 represent correlation gains (edges which were not present in Control network but showed up in BcKO) and 24 represent correlation losses. The table stratifies the set of pairs representing correlation gains and losses according to the amount of Ig genes (0, 1 or 2) present in a pair. Note that 39 out of 54 of correlation gain DCPs are formed by at least one Ig gene while only 2 out of 22 correlation losses have at least one Ig gene. B) The 78 DCPs are formed by a total of 94 Differentially Co-expressed genes (DC genes). 58 DC genes participate only in correlation gain DCPs, 31 only in correlation loss DCPs and 5 of them participate in both correlation gain and loss DCPs. The results show enrichment for Ig genes among DC genes in correlation gain: 24% (15 out of 63 (=58+5)) of DC genes are Ig genes vs 2.7% (11 out of 415) of other DEGs are Ig genes (p value < 0.001). Meanwhile no enrichment was observed for correlation loss as a result of B cell deficiency: 3% (1 out of 36 (=31+5)) of DC genes are Ig genes vs 2.7% (11 out of 415) of other DEGs are Ig genes.

Co-expression networks for BcKO data.

The nodes are composed by DEGs and the edges represent significant correlations between nodes. The causal genes (immunoglobulin genes) and the DCP edges are concentrated in the high connectivity region with several causal genes forming DCPs. A) 78 Differentially Correlated Pairs (DCPs) were found, of which 54 represent correlation gains (edges which were not present in Control network but showed up in BcKO) and 24 represent correlation losses. The table stratifies the set of pairs representing correlation gains and losses according to the amount of Ig genes (0, 1 or 2) present in a pair. Note that 39 out of 54 of correlation gain DCPs are formed by at least one Ig gene while only 2 out of 22 correlation losses have at least one Ig gene. B) The 78 DCPs are formed by a total of 94 Differentially Co-expressed genes (DC genes). 58 DC genes participate only in correlation gain DCPs, 31 only in correlation loss DCPs and 5 of them participate in both correlation gain and loss DCPs. The results show enrichment for Ig genes among DC genes in correlation gain: 24% (15 out of 63 (=58+5)) of DC genes are Ig genes vs 2.7% (11 out of 415) of other DEGs are Ig genes (p value < 0.001). Meanwhile no enrichment was observed for correlation loss as a result of B cell deficiency: 3% (1 out of 36 (=31+5)) of DC genes are Ig genes vs 2.7% (11 out of 415) of other DEGs are Ig genes. Contains the Ig genes considered causal along with annotation and whether they are considered DC genes or not. Click here for additional data file.

Cervical cancer

In order to study differentially co-expressed genes in a more complex biological model we turned to cancer. It is well known that cancers of the same clinically/morphological type can be very different on molecular levels. One of the most studied causes for such diversity is the different sets of chromosomal aberrations and mutations harbored by tumors otherwise defined as the same cancer. In previous study [29], we have found 36 cervical cancer driver genes located in multiple chromosomal aberrations ( Dataset 4). Thus we decided to use cervical cancer data from 29 for investigation of the role of DCPs in complex biological processes due to its heterogeneity and previously acquired knowledge of essential causal genes. Contains the chromosomal aberration genes considered causal along with annotation and whether they are considered DC genes or not. Click here for additional data file. We used the DEGs between tumor and normal tissue as the nodes of the co-expression networks. Since the number of samples (five datasets, 148 tumor samples and 67 normal samples) was larger than in BcKO study (two datasets, 22 paired samples), we used the partial correlation coefficient as a measure of co-expression ( Figure 3). The potential advantage of using partial correlation is that it aims to infer edges that are a result of direct regulatory relations [6]. Partial correlations were calculated through the Local Partial Correlation (LCP) method [30] ( Material and Methods).

Figure 3.

Co-expression networks for cervical cancer data.

Co-expression networks for cervical cancer data.

The nodes are composed by DEGs and the edges represent significant local partial correlation between nodes. A few causal genes (key drivers) and DCP edges are located in the high connectivity region, but scattered throughout the network. Only one key driver is amongst the genes in DCPs. In this study seven DCPs composed of 14 DC genes were found. Interestingly, all DCPs were differential correlations gained in tumors ( Table 1). Only one of the 36 key drivers (CEP70) was identified among the 14 DC genes. Accordingly, no enrichment of key driver genes among DC genes was detected in this analysis.

Table 1.

DCPs – cancer (* key drivers).

Gene symbol 1	Gene symbol 2	Change direction	Sign of local partial correlation in tumor	Regulation 1	Regulation 2
ANP32E	CACYBP	Gained edge	> 0	UP	UP
CENPN	DHFR	Gained edge	> 0	UP	UP
C10orf68	FGFR2	Gained edge	> 0	DN	DN
AK2	HNRNPR	Gained edge	> 0	UP	UP
CEP70*	SEPHS1	Gained edge	> 0	UP	UP
NIPAL2	TRPM3	Gained edge	> 0	DN	DN
They stem ARHGEF12	ZSCAN18	Gained edge	> 0	DN	DN

Even though we observed that DCPs are not necessarily formed by key drivers, it is known from literature that most of the DC genes found play regulatory roles in other types of cancer [31– 48]. Thus we hypothesized that DCPs are located downstream of key drivers and can be responsible for changes of regulatory chain events coming from the key drivers and spreading throughout the network. In order to verify this hypothesis, we investigated how close DC genes are to key drivers and whether their “signal flow” [49] in the tumor co-expression network is stronger than that of the other genes. In order to verify this hypothesis we investigated two network measures: Minimum Shortest Path and Bi-partite Betweenness Centrality. First we compared the shortest paths from key driver genes to DC genes and to all other DEGs in the network. We found that DC genes are located statistically closer than the rest of genes in the network to key drivers ( Figure 4A, Wilcoxon test < 0.014 and Permutation test < 0.021). Then we used Bi-partite Betweenness Centrality [6] as a measure of the signal flow from key drivers to peripheral genes (genes with only one edge) [6]. We evaluated this measure for DC genes and remaining DEGs and observed that DC genes had much higher values than other genes in the network. Figure 4B illustrates a comparison of boxplots of bi-partite betweenness centrality between these two groups concerning DCPs and the rest (non DCPs, non-key drivers, non-peripheral). We can observe that the bi-partite betweenness centralities of DCPs are concentrated in higher values than the rest. Mann-Whitney test gave us a p-value of 7.868 X 10 -5, which gives us evidence that the distribution of Bi-Partite Betweenness Centrality in DCP genes is higher. For more details see Figure S2. Thus, altogether these results suggest that DC genes might be “bottlenecks”, that is, required to transmit a signal from key drivers to other genes in the network, therefore, supplement the hypothesis of a regulatory role of DC genes ( Figure S1).

Figure 4.

Topological properties of Differentially Correlated Genes (DCGs).

Topological properties of Differentially Correlated Genes (DCGs).

A) Barplot of the shortest path to the causal genes and originated in either the genes in DCPs (in orange) or the non DCP genes (in blue). The distribution in orange is concentrated in lower values. B) Boxplot comparing the values of Bipartite Betweenness Centrality of the genes in DCPs (in orange) and the non-DCP genes (in blue). The boxplot on the left is concentrated in higher values. In addition, data from other cancers provide indirect support for the idea of regulatory role of DC genes in cervical cancer [31– 48]. However, since a role of these DC genes in carcinogenesis was not as straightforward as for immunoglobulin genes in B cell deficiency, we decided to perform experimental tests. Among the DC genes found for cervical cancer, there were seven up-regulated and seven down-regulated in cancer. Therefore, for validation experiments we chose one down-regulated (FGFR2) and one up-regulated (CACYBP) gene that have not been previously studied in cervical cancer for regulatory properties, but have a potential connection with cell death or proliferation based on their Gene Ontology annotations. In order to test if FGFR2 and CACYBP play critical regulatory roles in cancer pathogenesis, we evaluated the effect on in vitro knockdown of these genes on cell proliferation in a cervical carcinoma cell line. First, we tested two cervical cancer cell lines (Hela and ME180) and found that only ME180 had detectable expression levels of both genes. In order to perform these tests, we evaluated siRNAs and observed that they were able to knock down expression of both genes by at least 70% ( Figure 5A). CACYBP is up-regulated in tumor tissue, as compared to normal tissue ( Figure 5B). Consequently, if CACYBP has regulatory potential, as predicted by our analysis, it should function as an oncogene promoting cell proliferation. Therefore, the knockdown of this gene should result in a decrease of cell growth/survival. Since FGFR2 was found down-regulated in cervical carcinomas ( Figure 5B) its potential regulatory role would be as a tumor suppressor. Therefore, the knockdown of this gene is expected to increase cell growth. The subsequent analysis of cell proliferation confirmed our predictions for both genes: knockdown of CACYBP led to a decrease of cell growth, while knockdown of FGFR2 induced higher cell proliferation ( Figure 5C). Thus, these results provide additional support to our in silico prediction that DC genes may play a regulatory role in cell proliferation related to tumor growth.

Figure 5.

Experimental evaluation of DCGs in cervical cancer.

A) Efficacy of FGFR2 and CACYBP siRNA knockdown. qRT-PCR with primers for GAPDH as the internal control was used to determine expression and efficacy of FGFR2 and CACYBP specific siRNA knockdown in endothelial cells (ME180). ME180 cells were harvested 72 h after transfection with vehicle (Lipofectamine) and either scrambled control or targeting siRNA. B) Gene expression of FGFR2 and CACYBP (mean +/- standard deviation) for tumor and normal samples from five datasets used in the analysis. Since FGFR2 was found down-regulated in tumor tissue, its potential regulatory role would be as a tumor suppressor. However, CACYBP is up-regulated, thus CACYBP should function as an oncogene promoting cell proliferation. C) Evaluation of cell proliferation inhibition using xCelligence System. Proliferation data (cell index) was obtained at 72 h after transfection with Lipofectamine and either scrambled control or targeting siRNA. Inhibition index was calculated (two step normalization of cell index): inhibition index > 0 – cells transfected with targeting siRNA showed decrease in proliferation; < 0 – showed increase in proliferation; = 0 – no difference from control was found. One sided T test for mean (< 0 for FGFR2 and > 0 for CACYBP) was applied and returned statistically significant p-values for both of them (0.0258 for FGFR2 and 0.01978 for CACYBP).

Experimental evaluation of DCGs in cervical cancer.

Discussion

In the current study, the differential co-expression analysis [21] was applied to two relatively well-investigated biological systems [28, 29] in order to evaluate the potential importance of genes found using differential correlation analyses. Overall, the obtained results support the idea that DC genes play a regulatory role. While in B cell deficiency DCPs were found highly enriched with immunoglobulin genes (i.e. causal genes for alterations in the gut) we did not observe enrichment for key driver genes in cervical cancers. Rather, DCPs of cervical cancer seem to be located downstream of causal genes. Indeed, those DCPs have been found closer to key regulators than other genes in the network, actually representing “bottlenecks” for communication between driver genes previously published in 29 and the rest of the network ( Figure 4). Furthermore, some differentially co-expressed genes in cervical cancer have been previously implicated in processes such as metastasis, oncogenic autophagy and apoptosis. For example, CACYBP has been shown to promote colorectal cancer metastasis [31], TRPM3 was observed to play a role in oncogenic autophagy in clear cell renal cell carcinoma [32, 33], and AK2 was reported to activate apoptotic pathway [34]. Several genes are investigated for prognostic value for cancers such as myeloma [35], lymphoma [36], breast [37– 41] and gastrointestinal [42, 43] cancers. At least two genes were previously proposed as targets for anti-cancer agents: DHFR [44] and FGFR2 [45]. Moreover, CACYBP and ZSCAN18 were also reported as putative tumor suppressor genes in renal cell carcinoma [30, 46, 47]. In addition, we have tested two DC genes and confirmed their regulatory role (FGFR2 as a tumor suppressor and CACYBP as a potential oncogene in cervical cancer) by manipulating their expression in vitro. Altogether, published observations and our experimental validation for these two genes support the idea that DC genes revealed in the current study play a regulatory role and can be candidate targets for cervical cancer treatment. Interestingly, while in the model of B cell deficiency, the DC genes are highly enriched with causal regulatory genes, there was only one key driver in cervical cancer (CEP70), despite the DC genes in this system still seeming to play a regulatory role overall. Such a difference is potentially related to the fact that the mouse system studied in 28 is highly homogeneous (inbred mice) with only one cause of alterations (i.e. absence of B lymphocytes). Cervical cancer, however, is a heterogeneous system with different chromosomal aberrations and consequently turned-on expression of different driver genes in different patients. Therefore, we can speculate that differential correlations point to regulatory genes that are shared by majority of samples. This hypothesis warrants further investigation, especially considering that DCPs could represent common therapeutic targets for tumors that originated as a result of different genomic or epi-genomic events. In conclusion, this study provided additional evidence for the previously suggested idea [8– 27] that genes presenting alterations in correlation patterns between different phenotypes (i.e. states of biological system) play a critical regulatory role in transitions from one state to another. Furthermore, although our results do not allow for full generalization, they indicate that gain and not loss of correlations connects critical genes involved in transitions to new phenotypes. However, further studies are required to understand how changes in correlation patterns can point to genes with critical capacity to guide a biological system into certain state/phenotype.

Material and methods

Preparation of microarray data

All microarray data were analyzed using BRB Array-Tools developed by the Biometric Research Branch of the National Cancer Institute under the direction of R. Simon ( http://linus.nci.nih.gov/BRB-ArrayTools.html). Array data were filtered to limit analysis to probes with greater than 50% of samples showing spot intensities of >10 and spot sizes >10 pixels, and a median normalization was applied. Same as in cervical cancer [29]. The data were analyzed using BRB Array-Tools using the original normalization used in three studies [50– 52] and median normalization over entire the array for the fourth study [53]. For all studies, we only considered genes found in at least 70% of arrays.

Filtering and meta-analysis of microarray data

In every analysis (DEGs, DCPs and networks), filter of direction (same sign of correspondent parameter – difference of mean, difference of correlation, correlation and partial correlation) was required in a fixed number of datasets (2 out of 2 in BcKO and 3 out of 5 in cervical cancer). Then meta-analysis was done through Fisher combined probability test [54]. Next, the pairs with false discovery rate (fdr) [55] lower than a threshold are chosen. At last, only the pairs that pass PUC [56] are considered correlated and therefore represent edges in the network.

Analysis of microarray data

DEGs between groups of samples were identified by random variance paired t-test p-value lower than 5% with adjustment for multiple hypotheses by setting the fdr below 10% in BRB Array-Tools . Co-expression networks (BcKO and Control) were inferred through Pearson correlation with p-value < 20% and fdr adjustment below 2.5%. DCPs were calculated for pairs that were initially correlated (p-value < 20%) in at least one state. Then differences of Pearson correlation were tested following [21] with a p-value below 10% and fdr < 2%. At last only the DCPs that showed up in one of the networks were selected. DEGs were retrieved from a cervical cancer paper [29]. Correlation networks and DCPs followed the same procedure and in BcKO but with different p-values (correlation p-value < 10% with fdr < 10 -8 and difference of correlation p-value < 10% with fdr < 0.25%). Partial correlation was computed using local partial correlation method [30]. The initial significance was p-value lower than 40% and then fdr < 5%. For more details about the thresholds used, see Table S3 and Table S4.

Local partial correlation network

Two aspects of cervical cancer data motivated us to use local partial correlation for this system. First of all, we have more samples throughout five datasets (see Supplementary Table S1 and Supplementary Table S2) which allows us to have more confidence in our results and second we already know that tumors in general present heterogeneous causal factors. The partial correlation approach gives us the alternative to only consider edges that represent direct regulatory relations. In this paper we used the new approach developed in 30 called local partial correlation. This approach was elaborated specially for cases when there are more variables than samples, which happens regularly in genetics and is a serious problem in classical statistics. First we calculate the correlation network. Then for each significantly correlated pair the inverse method is applied exclusively to the correlation sub-matrix formed only by the closest neighbors of the pair along with the genes forming the pair, Figure 6. If the number of closest neighbors is still higher than the number of samples n, then we decreasingly rank the correlations of the neighbors to either genes in the pair and select the first n/2 neighbors. For each sub- matrix, we only keep the partial correlation value regarding the pair that formed that sub- matrix and then calculate its p-value also based on the sub- matrix. R script for calculation is available in Supplementary Material.

Figure 6.

Local partial correlation scheme: we calculate the LPC for pair X 2, X 5, (red nodes/edge).

Local partial correlation scheme: we calculate the LPC for pair X 2, X 5, (red nodes/edge).

The neighborhood of this pair is the set of nodes X 3, X 6, X 8, X 9 (black nodes/edges). X 1, X 4, X 7 (blue nodes) are significantly correlated with the black nodes (blue edges), but not with the red nodes. Thus the inverse method is applied exclusively to the correlation sub-matrix formed only by the genes X 2, X 5, X 3, X 6, X 8, X 9. In correlation matrices the gray entries are statistically non-significant empirical correlations. Partial correlations were estimated only for the significant (Pearson) correlations in co-expression network. Thus the same definition of DCPs (by Pearson correlation) can still represent structural changes as long as it remains present in one of the two networks. Figure 3 illustrates the local partial correlation network for cervical cancer using only tumor data. It has 578 connected nodes and 824 edges.

Minimum shortest path

The shortest path is a method that calculates distances between 2 nodes in a network. It consists of the minimum number of edges connecting 2 nodes. In this case we want to know the minimum number of edges connecting one node, either DCP gene or not, to a group of nodes: the key drivers Figure 7. For each gene we calculate the shortest path to all key drivers and get the minimum value. Then we compare the minimum shortest path to key drivers coming from DCP genes and the remaining genes. Figure 4A shows that the minimum shortest path to key drivers tend to be smaller when originated in DCP genes.

Figure 7.

In this example we show how to calculate the distance (length of shortest path) between the gene G 2 and group of genes D 1, D 2, D 3, D 4 (nodes in red).

Bi-partite betweenness centrality

Betweenness Centrality measures the node’s centrality in a network by counting the number of shortest paths from all vertices to all other vertices that pass through that node. A gene with high betweenness centrality has a great influence on the transfer of signal through the network Figure 8.

Figure 8.

Here we explain how to calculate bi-partite betweenness centrality (bc) between groups and .

Note that the node D has bigger bc because all shortest paths connecting nodes in group to nodes in group pass through the node D.

Here we explain how to calculate bi-partite betweenness centrality (bc) between groups and .

Note that the node D has bigger bc because all shortest paths connecting nodes in group to nodes in group pass through the node D. However we are interested in the signal passing from key drivers throughout the network. For this reason we decided to apply the measure previously developed by our lab [6] called Bi-partite Betweenness Centrality. It measures the amount of shortest path going from all genes in one group of vertices to all genes in a different group of vertices. In our case, the groups of genes are the key drivers and the peripheral genes (genes connected to only one edge).

Experimental design

FGFR2 and CACYBP knockdown experiment

ME180 cells were transfected with FGFR2-, CACYBP-specific siRNA or control siRNA using Lipofectamine RNAiMAX Transfection Reagent. Cell growth rate during 72h after siRNA transfection was measured using xCelligence system as described below. ME180 cell line was obtained from Dr. Pulivarthi H. Rao. It was cultured in RPMI medium with 10% FBS and 1% Penicillin-Streptomycin added. The cells were seeded at density 4000 cells per well in 96 F-bottom plates (seeding procedure was done according to ATCC protocol for ME180 cell line) and with cell culture media 200 ul per well. 24 hours after seeding, cells were transfected with one of the three siRNA, see Table 2.

Table 2.

Suppliers.

Target	Supplier	Supplier ID
FGFR2	ThermoFisher	s5173
CACYBP	ThermoFisher	s25819
Non-targeting siRNA	Dharmacon	D-001810-01-05

Before transfection, 100 uL of media was taken from each well. Transfection procedure was done according to Lipofectamine RNAiMAX Reagent protocol (Protocol Pub. No. MAN0007825 Rev. 1.0). 3pM of siRNA per well and Lipofectamine 0.6 uL per well were delivered in 20uL. 80 uL of fresh cell culture media was added to each well. Cells were collected 72 h after transfection using Lysis buffer from RNeasy Mini Kit (QIAGEN). RNA extraction was done using RNeasy Mini Kit (QIAGEN) according to the manufacturer’s protocol (no Dnase treatment step was done). Concentrations of RNA measured with Qubit RNA BR Assay Kit. cDNA was done using Bio-Rad iScript cDNA Kit according to the manufacturer’s protocol. Quantitative Real-Time PCR was done for the samples using QuantiFast SYBR Green PCR Kit and GAPDH as a control gene. Primers for the targets you can see in the Table 3.

Table 3.

Primers and Targets.

Target	Forward/ Reverse	Primer sequence (5' -> 3')
FGFR2	Forward	AACAGTTTCGGCTGAGTCCAG
FGFR2	Reverse	GCCCAGTGTCAGCTTATCTCTT
CACYBP	Forward	CTCTGTGGAAGGCAGTTCAAA
CACYBP	Reverse	TCAGGTAATCCCACCTTGTGTT
GAPDH	Forward	GGAGCGAGATCCCTCCAAAAT
GAPDH	Reverse	GGCTGTTGTCATACTTCTCATGG

qRT PCR set up: sample was heated to 95°C, followed by 40 cycles of 95°C for 10 sec and 60°C for 30 sec. CACYBP is up-regulated in tumor tissue, as compared to normal tissue ( Figure 5B). Consequently, if CACYBP has regulatory potential, as predicted by our analysis, it should function as an oncogene promoting cell proliferation. Therefore, the knockdown of this gene should result in a decrease of cell growth/survival. Since FGFR2 was found down-regulated in cervical carcinomas ( Figure 5B) its potential regulatory role would be as a tumor suppressor. Therefore, the knockdown of this gene is expected to increase cell growth. Cell growth was evaluated using xCelligence system (The RTCA DP Instrument) using manufacturer’s protocol. ME180 was cultured in RPMI media with 10% FBS and 1% Penicillin-Streptomycin added. The cells were seeded at density 4000 cells per well (E-Plate 16) in 200 uL of cell culture media. 24 hours after seeding, the experiment was paused for transfecton. Before transfection, 100 uL of media was taken from each well. Transfection procedure was done according to Lipofectamine RNAiMAX Reagent protocol (Protocol Pub. No. MAN0007825 Rev. 1.0). 3pM of siRNA per well and Lipofectamine 0.6 uL per well were delivered in 20uL; 80 uL of fresh cell culture media was added to each well. Plate was placed back in the slot and cell growth was evaluated for another 72 h. To evaluate cell growth rate cell index was transformed into Inhibition index in two steps: Cell indexes for all wells were exported to the excel file. For each treatment (including non-targeting siRNA transfected wells) we extracted cell index average for all wells at 20 h after seeding ( Cell Index Before Treatment) and at 96 h after seeding ( Cell Index After Treatment). To normalize cell index to initial cell number differences for each of the treatments we used the following formula: In next step we normalized each treatment with targeting siRNA to treatment with non-targeting siRNA. For this purpose in each experiment A/B Index from treatment (siRNA targeting either FGFR2 or CACYBP) was normalized to A/B Index from control treatment using the following formula: Final evaluation of growth was done according to the value of Inhibition Index: >0 – there is a decrease in growth; 0 – no difference between treated with targeting and treated with non-targeting siRNA; <0 – there is a growth after treating with targeting siRNA.

Data availability

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2016 Thomas LD et al. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). BcKO: Gene expression files containing array data from 28 are available under the GSE23934 superseries in the Gene Expression Omnibus (GEO) data repository. We worked with two groups of samples: B10.A littermates and BALB/C (Table S1). Cervical cancer: We have used the same datasets as in previous study [29] available at GEO: GSE7410 [50], GSE6791 [51], GSE7803 [52], GSE9750 [53], GSE26342 [29] ( Table S21). F1000Research: Dataset 1. Differentially expressed genes from BcKO study, 10.5256/f1000research.9708.d142100 [57] F1000Research: Dataset 2. Differentially correlated pairs from BcKO study, 10.5256/f1000research.9708.d142099 [58] F1000Research: Dataset 3. Causal genes from BcKO study, 10.5256/f1000research.9708.d142097 [59] F1000Research: Dataset 4. Causal genes from cervical cancer study, 10.5256/f1000research.9708.d142098 [60] F1000Research: Dataset 5. Cytoscape Edges and Nodes tables from network in Figure 1, 10.5256/f1000research.9708.d142101 [61] F1000Research: Dataset 6. Cytoscape Edges and Nodes tables from network in Figure 3, 10.5256/f1000research.9708.d142102 [62] F1000Research: Dataset 7. Raw data for Figure 5A,C, 10.5256/f1000research.9708.d142103 [63] The manuscript “Differentially correlated genes in co-expression networks control phenotype transitions” investigates the differentially co-expressed genes in two biological processes, a homogeneous one-causal-factor process (B cell deficiency) and a heterogeneous multi-causal system (cervical cancer). The authors have adopted the Pearson correlation and partial correlation for the inference of networks. Major revision: The networks were inferred from local partial correlation method, which is able to identify a linear relationship between two variables X and Y (genes), and this relationship may or may not be mediate by another gene Z. It is not clear why the authors have adopted the Pearson correlation for B cell deficiency analysis and the partial correlation for cervical cancer analysis. Moreover, it would be interesting to highlight the gain obtained by adopting the partial correlation. For instance, what were the relationships inferred with the partial correlation that would not be inferred using Pearson correlation? Another important issue is that even with partial correlation, only pairwise of relationships are identified. In the study presented at http://dx.doi.org/10.1109/JSTSP.2008.923841, it presents the Intrinsically Multivariate Predictive (IMP) Genes, which are genes that depend on a subset of predictors. How did the authors deal with these IMP genes? It is not clear how and why the microarray data was filtered. The authors could better describe how the data was filtered and how the parameters were adopted. The title “Differentially correlated genes in co-expression networks control phenotype transitions” is too rigid leading to the understanding that all correlated genes control the phenotype transitions. I believe that is not true. Authors could provide a more appropriate title. Minor revisions: Page 3: “homogenous” → homogeneous Page 4: "correlation lost" → correlation loss; I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. The manuscript "Differentially correlated genes in co-expression networks control phenotype transitions" by Lina Thomas et al, is devoted to describing a case study of two transcriptomic datasets with the focus on characterizing pairs of differentially correlated genes, with a limited experimental validation of the conclusions of the statistical analysis. The manuscript is clear, technically sound and exploits an interesting approach for the analysis of expression data. The conclusions of the statistical analysis are sufficiently justified. I like the detailed and illustrated description of the novel methods exploited the paper. Experimental validation of several findings is a big plus. Therefore, I think the article deserves to be indexed. I have several remarks for the manuscript which I think should be addressed before approval: I am not completely comfortable with the title of the manuscript, which is quite conceptual, while the content of the paper remains descriptive and does not provide mechanistic insight on how DCG pairs can control the phenotype. I suggest to the authors to have some reflexion on how to make it more adequate. Related to 1), in Introduction the desciption of the mechanisms by which DCG pairs can "contribute to propagation of perturbation" remain very illusive. I suggest to the authors to formulate more clearly at least several hypotheses or scenario by which DCG pairs might appear and play an important role. A figure illustrating such hypotheses would clarify what the authors mean. Figures 1 and 3 are not very informative. Can authors make an effort to improve this aspect (at least, visualize some DC gene names?) The authors do not discuss a possibility that appearance of DC pairs can be a result of differential sample tissue composition from several cell types (i.e., immune cells in jejunum or in a tumor tissue). Discussing this point would be an advantage. In several places, the authors apply terms "upstream", "downstream" with respect to the network which is undirected by its nature. I suggest to underline that the nature of correlation networks does not allow distinguishing causality direction and considering genes "downstream" of the key drivers is only a hypothesis which can not be assessed from the data. Description of the transcriptomic datasets in Materials and Methods is too brief, especially for the cervix dataset which seems to be quite composite. It would be appropriate to specify more clearly the dataset's composition (not simply referring to the original publications) directly in the paper text. One thing which is confusing to me is that the correlation networks are constructed differently for two case studies (direct vs partial). I understood the reason why partial correlation was prefered for the cervical cancer study, however, the question is: can the conclusion about that DC pairs do not contain key drivers in the case of cervical cancer be affected by the difference in the methodology of correlation graph computation? It would be usefull to clarify this aspect. The section "Filtering and meta-analysis of microarray data" was not clear to me. I suggest to re-write it. Minor remarks: Page 4: "correlation lost" -> "correlation loss" Description of the partial correlation method refers to a paper (30) which can not be easily accessed. Direct reference to the arXiv preprint would be more appropriate in this case I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. In the present work Thomas et al. employed method to find differentially co-expressed genes in co-expression networks of a B lymphocyte deficiency (largely mono-causal) and a complex cervical cancer (multi-causal) dataset. They used different graph-theoretical approaches to find relevant genes in this context. Interestingly, the authors found that 72% (39/54) of the correlation gains involve at least one Ig gene, which is in agreement with the previously shown association between intestinal gene expression and B cells ability to produce antibodies. Is it possible that this "correlation gains" are merely a consequence of the general enrichment of Ig genes in the DC list? In the cervical cancer analysis, the authors used shortest-path and betweenness centrality to argue for the regulatory relevance of DC genes. I think it would be great to supplement this finding with more biochemical information. For example, how many of these genes are transcription factors or protein kinases? Overall, I think this study is technically sound and properly executed. == Minor corrections In the abstract, "mus musculus" should read "Mus musculus". In Figure 2, Ig is underlined as if marked by a spellchecker. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

48 in total

1. Finding disease specific alterations in the co-expression of genes.

Authors: Dennis Kostka; Rainer Spang
Journal: Bioinformatics Date: 2004-08-04 Impact factor: 6.937

2. Mutations in fibroblast growth factor receptor 2 and fibroblast growth factor receptor 3 genes associated with human gastric and colorectal cancers.

Authors: J H Jang; K H Shin; J G Park
Journal: Cancer Res Date: 2001-05-01 Impact factor: 12.701

3. Antitumor Effects and Mechanisms of AZD4547 on FGFR2-Deregulated Endometrial Cancer Cells.

Authors: Yeonui Kwak; Hanna Cho; Wooyoung Hur; Taebo Sim
Journal: Mol Cancer Ther Date: 2015-08-20 Impact factor: 6.261

4. Differential coexpression analysis using microarray data and its application to human cancer.

Authors: Jung Kyoon Choi; Ungsik Yu; Ook Joon Yoo; Sangsoo Kim
Journal: Bioinformatics Date: 2005-10-18 Impact factor: 6.937

5. Calcyclin-binding protein inhibits proliferation, tumorigenicity, and invasion of gastric cancer.

Authors: Xiaoxuan Ning; Shiren Sun; Liu Hong; Jie Liang; Lili Liu; Shuang Han; Zhiguo Liu; Yongquan Shi; Yuan Li; Weiqin Gong; Shanhong Zhang; Yu Chen; Xueyan Guo; Yi Cheng; Kaichun Wu; Daiming Fan
Journal: Mol Cancer Res Date: 2007-12 Impact factor: 5.852

6. Identification of copy number gain and overexpressed genes on chromosome arm 20q by an integrative genomic approach in cervical cancer: potential role in progression.

Authors: Luigi Scotto; Gopeshwar Narayan; Subhadra V Nandula; Hugo Arias-Pulido; Shivakumar Subramaniyam; Achim Schneider; Andreas M Kaufmann; Jason D Wright; Bhavana Pothuri; Mahesh Mansukhani; Vundavalli V Murty
Journal: Genes Chromosomes Cancer Date: 2008-09 Impact factor: 5.006

Review 7. Cancer genomics and genetics of FGFR2 (Review).

Authors: Masaru Katoh
Journal: Int J Oncol Date: 2008-08 Impact factor: 5.650

8. Construct and Compare Gene Coexpression Networks with DAPfinder and DAPview.

Authors: Jeff Skinner; Yuri Kotliarov; Sudhir Varma; Karina L Mine; Anatoly Yambartsev; Richard Simon; Yentram Huyen; Andrey Morgun
Journal: BMC Bioinformatics Date: 2011-07-14 Impact factor: 3.307

9. Gene network reconstruction reveals cell cycle and antiviral genes as major drivers of cervical cancer.

Authors: Karina L Mine; Natalia Shulzhenko; Anatoly Yambartsev; Mark Rochman; Gerdine F O Sanson; Malin Lando; Sudhir Varma; Jeff Skinner; Natalia Volfovsky; Tao Deng; Sylvia M F Brenna; Carmen R N Carvalho; Julisa C L Ribalta; Michael Bustin; Polly Matzinger; Ismael D C G Silva; Heidi Lyng; Maria Gerbase-DeLima; Andrey Morgun
Journal: Nat Commun Date: 2013 Impact factor: 14.919

10. The novel colorectal cancer biomarkers CDO1, ZSCAN18 and ZNF331 are frequently methylated across gastrointestinal cancers.

Authors: Hege Marie Vedeld; Kim Andresen; Ina Andrassy Eilertsen; Arild Nesbakken; Raquel Seruca; Ivar P Gladhaug; Espen Thiis-Evensen; Torleiv O Rognum; Kirsten Muri Boberg; Guro E Lind
Journal: Int J Cancer Date: 2014-06-28 Impact factor: 7.396

4 in total

1. Using transcriptomics to predict and visualize disease status in bighorn sheep (Ovis canadensis).

Authors: Lizabeth Bowen; Kezia Manlove; Annette Roug; Shannon Waters; Nate LaHue; Peregrine Wolff
Journal: Conserv Physiol Date: 2022-07-03 Impact factor: 3.252

2. An eco-evo-devo genetic network model of stress response.

Authors: Li Feng; Tianyu Dong; Peng Jiang; Zhenyu Yang; Ang Dong; Shang-Qian Xie; Christopher H Griffin; Rongling Wu
Journal: Hortic Res Date: 2022-06-07 Impact factor: 7.291

Review 3. The Genomic Physics of COVID-19 Pathogenesis and Spread.

Authors: Ang Dong; Jinshuai Zhao; Christopher Griffin; Rongling Wu
Journal: Cells Date: 2021-12-28 Impact factor: 6.600

4. Transkingdom network reveals bacterial players associated with cervical cancer gene expression program.

Authors: Khiem Chi Lam; Dariia Vyshenska; Jialu Hu; Richard Rosario Rodrigues; Anja Nilsen; Ryszard A Zielke; Nicholas Samuel Brown; Eva-Katrine Aarnes; Aleksandra E Sikora; Natalia Shulzhenko; Heidi Lyng; Andrey Morgun
Journal: PeerJ Date: 2018-09-19 Impact factor: 2.984

4 in total