Literature DB >> 20801914

DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data.

Bao-Hong Liu1, Hui Yu, Kang Tu, Chun Li, Yi-Xue Li, Yuan-Yuan Li.   

Abstract

SUMMARY: Gene coexpression analysis was developed to explore gene interconnection at the expression level from a systems perspective, and differential coexpression analysis (DCEA), which examines the change in gene expression correlation between two conditions, was accordingly designed as a complementary technique to traditional differential expression analysis (DEA). Since there is a shortage of DCEA tools, we implemented in an R package 'DCGL' five DCEA methods for identification of differentially coexpressed genes and differentially coexpressed links, including three currently popular methods and two novel algorithms described in a companion paper. DCGL can serve as an easy-to-use tool to facilitate differential coexpression analyses. CONTACT: yyli@scbit.org and yxli@scbit.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2010        PMID: 20801914      PMCID: PMC2951087          DOI: 10.1093/bioinformatics/btq471

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

From the perspective of systems biology, gene coexpression analysis is useful for investigating gene interconnection at the expression level. Differential coexpression analysis (DCEA), which examines the change in expression correlation of gene pairs between two conditions, helps to explore the global transcriptional mechanisms underlying phenotypic changes. Compared with traditional differential expression analysis (DEA), the development of DCEA tools is lagged. In this work, we developed an R package, DCGL, implementing three previously proposed DCEA methods and two new algorithms reported in a companion paper (Yu,H. et al., submitted for publication). Log Ratio of Connections (LRC) calculates the logarithm of the ratio of the connectivities of a gene between two conditions (Reverter et al., 2006). Average Specific Connection (ASC) counts the ‘specific connections’ that exist in only one coexpression network (Choi et al., 2005). The weighted gene coexpression network analysis (WGCNA) weights links with correlation coefficients and compares the sums of the correlation coefficients of a gene (Mason et al., 2009; van Nas et al., 2009). In contrast, our two methods, differential coexpression profile (DCp) and differential coexpression enrichment (DCe), are designed based on the exact coexpression changes of gene pairs, and thus can differentiate significant coexpression changes from relatively trivial ones, and identify coexpression reversal between positive and negative (Yu,H. et al., submitted for publication). All the five methods are able to identify differentially coexpressed genes (DCGs) from microarray datasets, and DCe is also able to pick out differentially coexpressed gene pairs or links (DCLs).

2 DESIGN

A typical DCEA workflow involves three successive procedures: gene filtration, link filtration, DCG and DCL identification. Correspondingly, DCGL consists of three parts of functions (Fig. 1). For gene filtration, one choice is based on the expression level (expressionBasedfilter) and the other based on its variability (varianceBasedfilter). For link filtration, we provide three functions for cutting off coexpression values (systematicLinkfilter, percentLinkfilter and qLinkfilter). A gene pair (link) is filtered out if both of its coexpression values for two conditions are lower than the cutoff.
Fig. 1.

DCGL design. Function names are shown in italic texts.

DCGL design. Function names are shown in italic texts. The third part, also the core of the package, includes five methods for identifying DCGs and DCLs, which mainly differ in the measure of differential coexpression (dC) of a gene. After the steps of gene filtration and link filtration, suppose gene i is associated with n links whose coexpression values are projected to X = {x, x,…, x}and Y = {y, y,…, y} for two conditions. The dC measures of different methods are given in the following equations. where N and K indicate the numbers of total links and total DCL links in the coexpression network, respectively, and n and k indicate the links and DCLs connected to gene i (see Yu,H. et al., submitted for publication). with x′ and y′ are transformed from original x and y values with a ‘soft-thresholding’ strategy (Mason et al., 2009; van Nas et al., 2009). Link sets C1(i) and C2(i) for two conditions are determined by screening the coexpression values according to a certain threshold.

3 IMPLEMENTATION

DCGL is released as an R package including two gene filtering functions, three link filtering functions and five DCEA functions (Fig. 1). These functions generally expect gene expression matrices (with genes in rows and samples in columns) as a major input, and the ultimate output are genes ranked by dC measure or P-value, from which one can obtain a DCG list. DCe has an additional output of classified DCLs. DCGL can be obtained from the supplementary data to this manuscript, or at http://cran.r-project.org/web/packages/DCGL/index.html. We tested the five DCEA methods using dataset GSE3068 obtained from GEO (Table 1). Note that this test was carried out with the most time-efficient option of link filtration (setting thresholds on coexpression value directly). For the memory analysis, we tested DCp and DCe with the most memory-intensive filtration option ‘qLinkfilter’. We approached a memory limit of around 5.7 GB at a gene total of 7000. So it is anticipated that, if qLinkfilter is evoked, a gene expression matrix generally should undergo a gene filtration step beforehand so that the gene total is cut down to a few thousands or less.
Table 1.

Execution time (in seconds) of five DCEA methods in handling different subsets of GSE3068

10003000500070008799
DCp11065082
DCe63888161257
WGCNA1.29.626.45182
ASC1.29.626.45386.2
LRC18.424.648.878

Different subsets, with a reduced number of rows, were taken from GSE3068 by favoring genes with top-ranked expression variability. The computing platform is a Linux system with five nodes, each having a dual quad-core Intel Xeon 2.33 GHZ CPU and a RAM of 16 GB. Execution time was averaged over five repetitive runs.

Execution time (in seconds) of five DCEA methods in handling different subsets of GSE3068 Different subsets, with a reduced number of rows, were taken from GSE3068 by favoring genes with top-ranked expression variability. The computing platform is a Linux system with five nodes, each having a dual quad-core Intel Xeon 2.33 GHZ CPU and a RAM of 16 GB. Execution time was averaged over five repetitive runs.

4 EXAMPLE

Three simulated datasets are included in the package for exploring the functions. For example, ‘dataC’ gives expression values of 1000 genes in 20 samples divided equally into two groups corresponding to two conditions. Since this dataset contains a moderate number of genes, the gene filtration step can be skipped. The link filtration procedure is wrapped as a sub-function in the DCEA functions, so one can specify the link filtration choice in the arguments of DCEA functions. If the DCEA function DCe is called, one can get a resulted variable with four components. The gene names ranked by the dC measure (P-value) make up the first ‘$DCGs’ component, while DCLs of different types are given in other three components. Funding: Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences (2008KIP207); the National ‘973’ Basic Research Program (2006CB0D1203, 2006CB0D1205); the National Natural Science Foundation of China (30770497, 31000380); National Key Technologies R&D Program (2007AA02Z331 and 2009ZX10603). Conflict of Interest: none declared.
  4 in total

1.  Elucidating the role of gonadal hormones in sexually dimorphic gene coexpression networks.

Authors:  Atila van Nas; Debraj Guhathakurta; Susanna S Wang; Nadir Yehya; Steve Horvath; Bin Zhang; Leslie Ingram-Drake; Gautam Chaudhuri; Eric E Schadt; Thomas A Drake; Arthur P Arnold; Aldons J Lusis
Journal:  Endocrinology       Date:  2008-10-30       Impact factor: 4.736

2.  Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer.

Authors:  Antonio Reverter; Aaron Ingham; Sigrid A Lehnert; Siok-Hwee Tan; Yonghong Wang; Abhirami Ratnakumar; Brian P Dalrymple
Journal:  Bioinformatics       Date:  2006-07-24       Impact factor: 6.937

3.  Differential coexpression analysis using microarray data and its application to human cancer.

Authors:  Jung Kyoon Choi; Ungsik Yu; Ook Joon Yoo; Sangsoo Kim
Journal:  Bioinformatics       Date:  2005-10-18       Impact factor: 6.937

4.  Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells.

Authors:  Mike J Mason; Guoping Fan; Kathrin Plath; Qing Zhou; Steve Horvath
Journal:  BMC Genomics       Date:  2009-07-20       Impact factor: 3.969

  4 in total
  49 in total

1.  Comparative co-expression network analysis extracts the SlHSP70 gene affecting to shoot elongation of tomato.

Authors:  Nam Tuan Vu; Ken Kamiya; Atsushi Fukushima; Shuhei Hao; Wang Ning; Tohru Ariizumi; Hiroshi Ezura; Miyako Kusano
Journal:  Plant Biotechnol (Tokyo)       Date:  2019-09-25       Impact factor: 1.133

2.  Biological mechanism analysis of acute renal allograft rejection: integrated of mRNA and microRNA expression profiles.

Authors:  Shi-Ming Huang; Xia Zhao; Xue-Mei Zhao; Xiao-Ying Wang; Shan-Shan Li; Yu-Hui Zhu
Journal:  Int J Clin Exp Med       Date:  2014-12-15

3.  Bioinformatics analysis of the target gene of fibroblast growth factor receptor 3 in bladder cancer and associated molecular mechanisms.

Authors:  Xing Ai; Zhuo-Min Jia; Juan Wang; Gui-Ping DI; X U Zhang; Fengling Sun; Tong Zang; Xiumei Liao
Journal:  Oncol Lett       Date:  2015-05-19       Impact factor: 2.967

4.  Identification of hub genes and pathways associated with bladder cancer based on co-expression network analysis.

Authors:  Dong-Qing Zhang; Chang-Kuo Zhou; Shou-Zhen Chen; Yue Yang; Ben-Kang Shi
Journal:  Oncol Lett       Date:  2017-05-26       Impact factor: 2.967

5.  DINGO: differential network analysis in genomics.

Authors:  Min Jin Ha; Veerabhadran Baladandayuthapani; Kim-Anh Do
Journal:  Bioinformatics       Date:  2015-07-06       Impact factor: 6.937

6.  A TESTING BASED APPROACH TO THE DISCOVERY OF DIFFERENTIALLY CORRELATED VARIABLE SETS.

Authors:  By Kelly Bodwin; Kai Zhang; Andrew Nobel
Journal:  Ann Appl Stat       Date:  2018-07-28       Impact factor: 2.083

7.  Identification of transcription regulatory relationships in rheumatoid arthritis and osteoarthritis.

Authors:  Guofeng Li; Ning Han; Zengchun Li; Qingyou Lu
Journal:  Clin Rheumatol       Date:  2013-01-08       Impact factor: 2.980

8.  Machine learning-based differential network analysis: a study of stress-responsive transcriptomes in Arabidopsis.

Authors:  Chuang Ma; Mingming Xin; Kenneth A Feldmann; Xiangfeng Wang
Journal:  Plant Cell       Date:  2014-02-11       Impact factor: 11.277

9.  Identification of differentially coexpressed genes in gonadotrope tumors and normal pituitary using bioinformatics methods.

Authors:  Tao Cai; Jie Xiao; Zhi-Fei Wang; Qiang Liu; Hao Wu; Yuan-Zheng Qiu
Journal:  Pathol Oncol Res       Date:  2013-11-07       Impact factor: 3.201

10.  Differential network analysis reveals dysfunctional regulatory networks in gastric carcinogenesis.

Authors:  Mu-Shui Cao; Bing-Ya Liu; Wen-Tao Dai; Wei-Xin Zhou; Yi-Xue Li; Yuan-Yuan Li
Journal:  Am J Cancer Res       Date:  2015-08-15       Impact factor: 6.166

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.