Literature DB >> 28874883

xSyn: A Software Tool for Identifying Sophisticated 3-Way Interactions From Cancer Expression Data.

Baishali Bandyopadhyay1, Veda Chanda1, Yupeng Wang1,2,3.   

Abstract

BACKGROUND: Constructing gene co-expression networks from cancer expression data is important for investigating the genetic mechanisms underlying cancer. However, correlation coefficients or linear regression models are not able to model sophisticated relationships among gene expression profiles. Here, we address the 3-way interaction that 2 genes' expression levels are clustered in different space locations under the control of a third gene's expression levels.
RESULTS: We present xSyn, a software tool for identifying such 3-way interactions from cancer gene expression data based on an optimization procedure involving the usage of UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and synergy. The effectiveness is demonstrated by application to 2 real gene expression data sets.
CONCLUSIONS: xSyn is a useful tool for decoding the complex relationships among gene expression profiles. xSyn is available at http://www.bdxconsult.com/xSyn.html.

Entities:  

Keywords:  3-way interaction; cancer; gene expression; mutual information; optimization; software; synergy

Year:  2017        PMID: 28874883      PMCID: PMC5576537          DOI: 10.1177/1176935117728516

Source DB:  PubMed          Journal:  Cancer Inform        ISSN: 1176-9351


Introduction

Constructing gene co-expression networks from cancer expression data is important for studying the genetic mechanisms underlying cancer.[1,2] Gene co-expression networks are frequently found to be different between normal and cancer samples.[3] Moreover, correlations between gene expression profiles are often dynamic, even depending on the genotypes of single-nucleotide polymorphisms.[4,5] However, correlations or linear regression models describe the general trends among gene expression profiles, thus are not able to capture sophisticated interactions among gene expression profiles. Synergy measures the joint interaction of 2 genes toward the explanation of the occurrence (or degree of occurrence) of a specific phenotype,[6] which is equal to the difference between the mutual information[7] of the 2 genes’ expression profiles with the phenotype and the sum of the mutual information of each gene profile alone with the same phenotype. It is defined formally as I(G1, G2; C) − [I(G1; C) + I(G2; C)], where I represents mutual information, G1 and G2 are 2 interacting genes, and C is the phenotype. A positive value indicates that the combined interaction overwhelms the individual ones, suggesting an interaction between the 2 genes under the phenotype. Conventionally, the phenotype C is binary sample status (eg, tumor or normal). In this study, we address the scenario that C coincides with the high or low expression level of a control gene x (Gx). Because binary stratification of expression levels may vary among genes, the 3 genes actually comprise a 3-way interaction. High-order genes interactions are widespread in human genomes and influence complex traits.[8] Thus, investigation of 3-way interactions, the basic form of high-order interactions, has practical significance toward fully understanding of high-order gene interactions and genotype-phenotype relationships.

Methods

Sophisticated relationships between 2 interacting genes are modeled using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering algorithms.[9] Synergy on top of UPGMA indicates how likely points in individual clusters of 2 genes’ expression levels tend to show the same phenotypic classes, and in our specific case, the chance that 2 genes’ expression levels were clustered in different space locations under the control of a third gene’s expression levels. To overcome computational intractability of 3 gene combinations, an optimization procedure was designed. First, an optimal synergy is computed for any gene pair via a greedy algorithm which flips sample statuses to increase synergy. If the optimal synergy passes a threshold (e.g., 0.9), the new sample statuses are searched for a third gene which shows the highest degree of differential expression. We use the new sample statuses to approximate the binary stratification of the control gene’s expression levels, which is acceptable considering the high levels of noises in high-throughput expression data. If the identified differential expression is statistically significant, the 3-way interaction is kept as a result. Our algorithm and software are named xSyn. The synergy between 2 gene expression profiles was computed via mutual information and conditional entropy, as previously noted.[10] The detailed algorithm is described as follows: Select a number of genes showing the lowest mutual information (i.e., highest conditional entropy) with the phenotype C. For each pair Gi and Gj among the selected genes: Compute the synergy under the phenotype C (sample statuses). An optimization procedure is applied. The procedure starts from the phenotype C, makes iterations to flip the status of one sample at each iteration if the synergy can be best increased, and stops when the synergy reaches convergence (i.e., cannot be increased). If the synergy reaches a threshold (e.g., 0.9), this gene pair is retained for putative 3-way analysis. Retrieve the new sample statuses generated by the optimization procedure. Loop through each gene to assess whether that gene could be Gx. Compute the P value of the t test for assessing differential expression under the new sample statuses. If the P value (after Bonferroni correction) is smaller than the cutoff (e.g., .05), then Gi, Gj, and Gx are output as a 3-way interaction. The xSyn software package was written in C++ and should be run on Linux operating systems. Before compiling, c++11 and the GNU Scientific Library (GSL) need to be installed. The “readme.txt” file contains detailed instructions for executing the programs. The input gene expression data file should be tab-delimited, with the first 2 rows specifying the sample names and phenotypes (0 or 1). All subsequent rows should contain expression data for each gene/transcript/probe set. There are 2 types of output files. The first type records intermediate results, which contain gene pairs, optimal synergy, optimal sample labels, and clusters. A program for assessing the statistical significance of synergy based on permutations is provided. The second type contains the generated 3-way interactions. Script examples for multiple threading are provided. We note that the default thresholds for steps 1 and 2c were chosen heuristically based on a single workstation with 20 cores. Users may relax the thresholds to increase solution coverage.

Results

We applied the xSyn software to a prostate cancer microarray expression data set.[11] The data set contained 50 normal samples, 52 cancer samples, and 12 625 probe sets. Top 1000 probe sets showing the lowest mutual information with the phenotype were selected to assess 3-way interactions (however, all probe sets were considered for assigning gene x). The synergy cutoff was 0.9, and P value cutoff of t test (after Bonferroni correction) was .05. Twenty computer cores were used to parallelize computation, and the computation completed within 5 days. A total of 2415 3-way interactions were generated. We randomly selected a 3-way interaction for validation. Figure 1 visualizes the effectiveness of xSyn in generating the optimal synergy for a pair of gene expression profiles and identifying the corresponding differential expression from a third gene. In the initial state (Figure 1A), 2 genes’ expression profiles are clustered, but all clusters are mixed with different sample statuses. After optimization (Figure 1B), most clusters are filled with the same sample statuses, and thus, an optimal synergy is achieved. Next, a 3-way interaction is identified. The gene x shows differential expression with the new sample statuses generated by the optimization procedure (Figure 1C). We then applied xSyn to another expression data set. The data set had 80 samples, of which 40 were treated as cancer samples (E2F-null samples).[12] xSyn was executed under single-thread mode and default parameters, and 80 three-way interactions were obtained. A randomly chosen 3-way interaction was visualized and similar results to Figure 1 were obtained (Figure S1). Collectively, these validations indicate that the xSyn software is effective in identifying the addressed 3-way interactions.
Figure 1.

Visualization of a 3-way interaction identified by the xSyn software. “.” represents sample status “0,” whereas “+” represents sample status “1.” (A) Clusters are mixed with different sample statuses before optimization. (B) Clusters tend to be filled with the same sample status after optimization, and thus, an optimal synergy is achieved. (C) The gene x shows differential expression under the new sample statuses.

Visualization of a 3-way interaction identified by the xSyn software. “.” represents sample status “0,” whereas “+” represents sample status “1.” (A) Clusters are mixed with different sample statuses before optimization. (B) Clusters tend to be filled with the same sample status after optimization, and thus, an optimal synergy is achieved. (C) The gene x shows differential expression under the new sample statuses.

Conclusions

xSyn is a software tool for identifying sophisticated 3-way interactions from cancer gene expression data based on optimization algorithms and information theory. The effectiveness of xSyn has been demonstrated by application to 2 gene expression data sets. xSyn is a useful tool for investigating the complex relationships among gene expression profiles in all types of gene expression data such as microarray, RNA-Seq, and array comparative genomic hybridization (CGH).
  10 in total

1.  SNPxGE(2): a database for human SNP-coexpression associations.

Authors:  Yupeng Wang; Sandeep J Joseph; Xinyu Liu; Michael Kelley; Romdhane Rekaya
Journal:  Bioinformatics       Date:  2011-11-30       Impact factor: 6.937

2.  Prediction and genetic demonstration of a role for activator E2Fs in Myc-induced tumors.

Authors:  Kenichiro Fujiwara; Inez Yuwanita; Daniel P Hollern; Eran R Andrechek
Journal:  Cancer Res       Date:  2011-01-18       Impact factor: 12.701

3.  Differential coexpression analysis using microarray data and its application to human cancer.

Authors:  Jung Kyoon Choi; Ungsik Yu; Ook Joon Yoo; Sangsoo Kim
Journal:  Bioinformatics       Date:  2005-10-18       Impact factor: 6.937

4.  Gene expression correlates of clinical prostate cancer behavior.

Authors:  Dinesh Singh; Phillip G Febbo; Kenneth Ross; Donald G Jackson; Judith Manola; Christine Ladd; Pablo Tamayo; Andrew A Renshaw; Anthony V D'Amico; Jerome P Richie; Eric S Lander; Massimo Loda; Philip W Kantoff; Todd R Golub; William R Sellers
Journal:  Cancer Cell       Date:  2002-03       Impact factor: 31.743

Review 5.  Higher-order genetic interactions and their contribution to complex traits.

Authors:  Matthew B Taylor; Ian M Ehrenreich
Journal:  Trends Genet       Date:  2014-10-02       Impact factor: 11.639

6.  Identification of gene interactions associated with disease from gene expression data using synergy networks.

Authors:  John Watkinson; Xiaodong Wang; Tian Zheng; Dimitris Anastassiou
Journal:  BMC Syst Biol       Date:  2008-01-30

7.  Computational analysis of the synergy among multiple interacting genes.

Authors:  Dimitris Anastassiou
Journal:  Mol Syst Biol       Date:  2007-02-13       Impact factor: 11.429

8.  Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types.

Authors:  Yang Yang; Leng Han; Yuan Yuan; Jun Li; Nainan Hei; Han Liang
Journal:  Nat Commun       Date:  2014       Impact factor: 14.919

9.  Gene co-expression analysis for functional classification and gene-disease predictions.

Authors:  Sipko van Dam; Urmo Võsa; Adriaan van der Graaf; Lude Franke; João Pedro de Magalhães
Journal:  Brief Bioinform       Date:  2018-07-20       Impact factor: 11.622

10.  Efficiently finding genome-wide three-way gene interactions from transcript- and genotype-data.

Authors:  Mitsunori Kayano; Ichigaku Takigawa; Motoki Shiga; Koji Tsuda; Hiroshi Mamitsuka
Journal:  Bioinformatics       Date:  2009-09-07       Impact factor: 6.937

  10 in total
  1 in total

1.  Comprehensive and Systematic Analysis of Gene Expression Patterns Associated with Body Mass Index.

Authors:  Paule V Joseph; Rosario B Jaime-Lara; Yupeng Wang; Lichen Xiang; Wendy A Henderson
Journal:  Sci Rep       Date:  2019-05-15       Impact factor: 4.379

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.