Literature DB >> 25602758

FastGCN: a GPU accelerated tool for fast gene co-expression networks.

Meimei Liang¹, Futao Zhang¹, Gulei Jin¹, Jun Zhu¹.

Abstract

Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2015 PMID： 25602758 PMCID： PMC4300192 DOI： 10.1371/journal.pone.0116776

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Gene co-expression analysis provides a comprehensive way to identify the interactions between and the functions of certain genes. Additionally, co-expression analysis is an important step in bioinformatics analyses of gene expression data [1]. Currently, the majority of gene expression data are generated from microarray and RNA-seq methods. Microarray technologies have been widely used over the last few decades. With the emergence of high-throughput next-generation sequencing technologies, RNA-seq technology has been introduced. RNA-seq has several advantages over microarray technologies and may fully replace microarrays when several issues are overcome [2-4]. Gene co-expression networks comprise one type of valuable biological networks. One network can cover all human genes [5]. In gene co-expression networks, a gene is represented by vertexes and the edges that correspond to the pairwise correlations between expressions. Many methods have been published to construct gene co-expression networks [6-9]. The Pearson correlation coefficient is widely used. However, this method requires a substantial amount of time and space for large datasets [10-12]. Recently, Graphics Processing Units (GPUs) provide a promising tool to process tasks in parallel. GPUs are designed to have hundreds or even thousands of cores with small caches and logic control units. Therefore, the nature of GPU architecture makes it a powerful device for parallel computing rather than for data caching or branch control. The gene co-expression analysis for Pearson correlation coefficient computing is pairwise and easy to divide into small tasks. These tasks are independent and can be assigned to many-core GPUs for parallel processing. In the future, GPU computing should be used to construct the co-expression networks for large datasets [7]. CUDA (Compute United Device Architecture) C is a C-extended language developed by NVIDIA. This language is used to develop programs that can run on CUDA GPUs. The work pattern of GPU computing is SIMT (Single Instruction Multiple Thread). The code segment that starts a GPU is called Kernel. When Kernel runs, the thread space should be defined. The GPU threads are organized in Blocks. The thread number in Block is limited. No more than three dimensions are available for address space in Block. In an identical manner, Blocks are organized into a Grid, for which the address space should have no more than three dimensions. When the GPU starts computing, Blocks are assigned to SMX (prior to Kepler architecture, SMX was called SM) and running in a SIMT pattern. The number of running Blocks in SM depends on the Register number that one thread needs and the size of SMEM (Shared Memory). Threads should be busy for as much time as possible. The Block size and the resources that each thread claims leverage the entire resource. The variables defined in Kernel are allocated in Registers automatically. Several shared variables or data structures are allocated in SMEM. SMEM and Registers are in the GPU chip; therefore, these two components have high access bandwidth. Other data are allocated in GMEM (Global Memory). To hide the GPU latency, the number of running warps (32 threads a warp) on the SM (Stream Multiprocessor) should be as many as possible. The GPU thread space is created by the syntax ⋘ grid, block⋙. In general, the size of the grid should be at least three times the number of SM. Moreover, there should be more than four warps in a Block. In recent years, several tools and methods were introduced to harness the GPU power for increasing processing speeds in gene expression network analyses [13-17]. Here we propose a computationally efficient tool, FastGCN, to construct gene co-expression networks. The analysis procedure mainly contains four steps: preprocessing the input data with genetic information entropy, computing the Pearson correlation coefficient for the gene pair, transforming the coefficients to a normal distribution and identifying modules. GPU computing technology was exploited in all four steps. A sparse matrix compression was used in the correlation coefficient matrix. Our method outperformed the traditional gene co-expression analysis methods. This method is a powerful tool to assist the construction of gene co-expression networks. The source code and the executable binaries for this software are available at https://github.com/DrLiang/FastGCN.

Methods

Data Preprocessing with Genetic Information Entropy

Information entropy was proposed by Shannon [18] and has been widely used in industry to measure the uncertainty of a random variable. After the initial description, information entropy was heavily used in the life sciences [19-26]. In the present study, we used genetic information entropy to reduce the input data. For one dataset with n individuals and m genes, the genetic information entropy of the i-th gene can be defined as the following: where , x is the expression abundance of the i-th gene of the j-th individual. When p reaches an equilibrium distribution (the i-th gene has no differential expression among the individuals), the genetic information entropy H attains its maximum value. Therefore, we can filter a portion of the non-differentially expressed genes using the genetic information entropy.

Pearson Correlation Analysis

For two genes x and y, the Pearson correlation coefficient is the following: where x and y are the expression abundances of genes x and y of the i-th individual, respectively.

Normalization of Pearson Correlation Coefficients

For k genes, we are able to write t = k(k – 1) / 2 for the Pearson correlation coefficients. We normalized these coefficients to a test statistic, z, to test for the co-expression between two genes, Where is the standard deviation and is an overall mean. Therefore, z is asymptotically distributed in a standard normal distribution. We can then transform the z-value to a P-value. Finally, we used the False Discovery Rate to control the multiple tests.

Module Identification

With the normalized z as the evaluation criteria, we can easily transform coefficient matrix to adjacency matrix. The adjacency function is defined as: where q is the q-value for False Discovery Rate control. τ is threshold parameter. When the adjacency matrix is defined, the node connectivity can be got by: To select the relevant genes and reduce the computational burden, the genes with c = 0 are removed. In gene co-expression network, modules are always defined as clusters of highly correlated genes [8,27-29]. We implemented module identification analysis based on topological overlap measure with a node similarity clustering method. The similarity matrix was defined as: where is the number of shared nodes that connect to both node i and node j. c and c are the connectivity of node i and node j, respectively. The similarity matrix is also symmetric. These similar genes can form hot blocks which can be identified as modules along the diagonal.

Implementation

We implemented the algorithms of these four steps using GPU technology with CUDA. The GPU works in a SIMT manner. When the GPU issues one instruction, many threads initiate on different data. These algorithms are able to exploit the parallel GPU computing power because each task was independent and easily assigned to one GPU thread.

Data preprocessing on the GPU

For m genes (or exons) and n individuals in the raw input data, we created m GPU threads to compute the genetic information entropy. One entropy task was computed by one GPU thread. Using the traditional method, we should copy the data from the host memory to the GPU device memory to perform GPU computing. However, this method has two shortcomings in gene co-expression analysis. First, the raw data can be large but the GPU device memory is relatively small. Occasionally, the GPU device memory cannot accommodate all the input data, intermediate data and the result data. Second, transporting data from host memory to the GPU device memory belongs to I/O operations. I/O operations are inherently slow. To improve the performance, we should reduce the I/O operations as much as possible. To meet these challenges, Zero-copy technology was exploited. Zero-copy technology allows the GPU to use page-locked host memory. The GPU can directly access this part of the host memory. However, page-locked host memory is not page-able, and with overuse over a long time, this process can decrease the performance of the entire system. The asymptotic time complexity of entropy computation is O(mn). With GPU computing, this procedure can finish in a short time. Then, the occupied page-locked memory can be released. S1a Fig. shows the GPU architecture for the genetic information entropy computation. After determining the entropies from the GPU, the CPU took the responsibility of filtering the genes accordingly.

Pearson Correlation Analysis on the GPU

The Pearson correlation analysis is a pairwise analysis. For k genes remaining after data preprocessing, j = k(k – 1) / 2 coefficients should be computed. The asymptotic time complexity of Pearson correlation analysis is O(nk 2). This analysis is the most time consuming step in the entire analysis. Generally, the coefficients are stored in a k×k strictly lower triangular matrix or upper triangular matrix. To save space, we used a packed matrix storage model. The packed matrix stores only the coefficients in a vector. To minimize the latency and improve the performance, as many GPU threads should be created as possible. We created j GPU threads. Each thread computed the coefficient of one gene pair. To implement this step, the gene index must be calculated by the GPU thread identification; therefore, we mapped the one-dimensional GPU thread space onto a two-dimensional input matrix space and one-dimensional result vector space. The mapping relationship is shown in S2a Fig. The GPU threads read the data from page-locked memory and stored the results in the GPU device memory. Then, the page-locked memory can be freed. S1b Fig. shows this pattern.

Normalization of the GPU

We created j GPU threads in this normalization. One thread normalized one coefficient. The standard deviation in Equation 3 was shared between all GPU threads. Having each thread calculate the standard deviation separately is not an efficient method. Therefore, we used the summation reduction to determine the standard deviation of the threads cooperatively (S2b Fig.). The first half of the threads was used to sum the two coefficients. Then, the active threads were used at a half of the last step, and the summation was repeated. When the active threads were reduced to one thread, we achieved the final result. In the GPU memory hierarchy, shared memory is on-chip memory. On-chip memory is much faster than GPU device memory. Therefore, we used shared memory to achieve a high performance. However, if two or more threads request access to the identical memory bank when using shared memory, a bank conflict occurs. The access would then be serialized. To avoid bank conflict, we used sequential addressing. In this step, GPU threads are not independent from one another. The threads should be synchronized. To ensure the correct result, several barriers should be added to control the synchronization. This architecture is shown in S1c Fig.

Module Identification on GPU

We implemented module identification with three GPU kernels. We also assume there are k genes. First, k GPU threads were created to get the adjacency matrix and the node connectivity vector. Then, we used CUBLAS, which is a CUDA library of basic linear algebra subroutines to calculate the matrix h defined in Equation 6. Finally, we used k×k GPU threads in two-dimensional thread space to compute the similarity matrix. We plotted S1d Fig. showing this architecture.

Results

We have developed four versions of our tool FastGCN: GPU version, Multi-core CPU version, Single-thread CPU version with C/C++ and Single-thread CPU version in R. The version using the R language was highly optimized; we eliminated the explicit loops in all the computing procedures. To evaluate the performance of FastGCN for co-expression network inference, we used the GPU workstation consisting of a NVIDIA Tesla K20c card running on an Intel Xeon E5-2690 (16 CPU cores) with a clock rate of 2.90GHz using 256GB DDR3 host memory. We conducted several experiments using different datasets to compare the performance of these four versions. These datasets are extracted from the expression data covering Breast Invasive Cancer (BRCA) from TCGA [30] (https://tcga-data.nci.nih.gov/tcga/) which contains 17,814 genes and 590 individuals. Each dataset contains 590 individuals but has a different number of genes. We implemented two test groups: one with fifty percent of gene cutoff at the data preprocessing stage and the other with no gene cutoff. We tested the multi-core CPU version on 16 CPU threads running on 16 CPU cores. The results of this comparison are summarized in Table 1 and Table 2, respectively. This comparison showed the obtained running obtained as the mean running time of 1000 simulations. We calculated the running time from entropy preprocessing, the Pearson correlation coefficients calculated using the z-score normalization and module identification whereas excluding the input procedure and the output procedure because we only compared the computational performance of this tool. From Table 1, we can see that by using the GPU, the entire computational procedure of the dataset with 16,000 genes finished in about 2 seconds. However, 134.1 seconds were required for the single-thread R version, and 20.369 seconds were required for the single-thread C/C++ version. Therefore, the GPU implementation was approximately 63 times faster than the traditional serial CPU implementation in the R language and approximately 10 times faster than the CPU implementation with the C/C++ language. From Table 2, the GPU version can achieve speed increases of almost 80 times that of the single-thread R version. WGCNA[8] is a promising R package for correlation network analysis. The performance of WGCNA was better than the single-thread R implementation but worse than others. For the sake of immediacy, we also plotted Fig. 1A and Fig. 1B to show the speedup curves. These curves show that when the dataset is small, the multi-core version running on 16 CPU threads outperforms the other processes. However, as the size of dataset increases, the GPU version can surpass the multi-core version in terms of performance.

Table 1

Running time (in seconds) for the GPU implementation, Multi-core CPU implementation, Single-thread C/C++ implementation and Single-thread R implementation when 50% of the genes were filtered out during the data preprocessing stage (individual number = 590).

Gene Number	GPU	Multi-core CPU	Single-thread C/C++	Single-thread WGCNA	Single-thread R
2k	0.624	0.094	0.375	1.622	1.295
4k	0.671	0.218	1.138	5.020	5.101
6k	0.811	0.39	2.403	14.102	15.163
8k	0.967	0.655	4.447	25.032	25.740
10k	1.202	0.889	7.301	41.769	42.260
12k	1.388	1.419	10.811	69.685	69.420
14k	1.747	1.965	14.633	94.731	95.220
16k	2.122	3.010	20.369	130.627	134.100

Multi-core CPU version ran on 16 CPU threads running on 16 CPU cores.

Table 2

Running time (in seconds) for the GPU implementation, Multi-core CPU implementation, Single-thread C/C++ implementation and Single-thread R implementation when no genes were filtered out during data preprocessing stage (individual number = 590).

Gene Number	GPU	Multi-core CPU	Single-thread C/C++	Single-thread WGCNA	Single-thread R
2k	0.655	0.218	1.295	4.529	3.947
4k	0.858	0.592	4.384	24.250	24.020
6k	1.295	1.591	10.311	63.971	65.400
8k	1.950	2.902	19.906	127.037	131.148
10k	3.089	4.68	31.31	220.112	221.040
12k	4.321	8.236	46.863	322.876	326.34
14k	6.771	10.968	72.758	480.106	484.56
16k	8.003	15.600	85.597	618.084	632.04

Multi-core CPU version ran on 16 CPU threads running on 16 CPU cores.

Figure 1

Curves of speedups against Single-thread R CPU implementation.

(a) Speedup curves when 50% genes were filtered out at data preprocessing stage. (b) Speedup curves when no gene was filtered out at data preprocessing stage.

Curves of speedups against Single-thread R CPU implementation.

(a) Speedup curves when 50% genes were filtered out at data preprocessing stage. (b) Speedup curves when no gene was filtered out at data preprocessing stage. To illustrate the application of FastGCN in the construction of a co-expression network of gene expression data, FastGCN was applied to the Breast Invasive Cancer (BRCA) data from TCGA. The genes in the HTLV-I infection pathway from the KEGG database (http://www.genome.jp/kegg/pathway.html) were included in the analysis. Multi-core CPU version ran on 16 CPU threads running on 16 CPU cores. Multi-core CPU version ran on 16 CPU threads running on 16 CPU cores. The co-expression network constructed by FastGCN is shown in Fig. 2. We identified three hub genes that are heavily connected: CRTC1, CD3D and WNT16. These hub genes were of functional importance. These hub genes have been reported to be associated with mortality [31-34]. Previous studies have also reported that CD3D, LCK and ZAP70 are relatively expressed [35].

Figure 2

The co-expression network reconstructed by FastGCN using BRCA expression data with HTLV-I infection pathway.

Discussion

Numerous tools have been proposed for analyzing gene co-expression. However, the majority of these tools are not popular because they are inconvenient to use and have prohibitive computational time because of the extremely large number of tests. To address these problems, we have presented a user-friendly, accelerated and optimized tool for gene co-expression analysis. We used a novel entropy based method for data preprocessing. According to the entropy of each gene, we can make the reduction to cutoff genes with no or small expression changes. From Table 1 and Table 2, this method was shown to alleviate the computational burden. Because the gene number was reduced, the multiple tests could also be partially controlled. Taking advantage of the parallel nature of multi-core GPUs, the parallel version of this tool can highly reduce the computational time compared to the traditional single-thread version. GPUs and CPUs have different architectures. GPUs are designed to have a higher memory bandwidth and more transistors for data parallel processing but less ability for data caching and branch control than CPUs[36]. Therefore, GPU computing is suitable for a program that has few branches and is independently executed on many data elements. We accelerated the entropy calculation, correlation calculation, z-score transformation and module identification on the GPU because these four procedures are independent and parallel and have minimal branches. Using the datasets illustrated in Table 1 and Table 2, the computation of these four procedures can be completed in a few seconds on a GPU. Additionally, we achieved promising speedups over a single-thread R or C/C++ implementation on a CPU. For example, using a dataset with 16,000 genes and 590 individuals, the entire computational procedure can be finished about 2 seconds on a GPU, which is 63 times the speed of the R implementation and 10 times the speed of the C/C++ implementation on a CPU (when we filter 50 percent of the genes in the data preprocessing step). If no genes were filtered, then the entropy calculation would not be invoked. Using the identical dataset, the computational procedure finished in 8 seconds on a GPU. This completion time was approximately 80 times the speed of the R implementation and more than 10 times the speed of the C/C++ implementation on a CPU. The GPU cannot reach its maximum potential with small datasets. The multi-core program is faster than the GPU program. However, the GPU overtakes the multi-core CPU when the dataset is greater than 6000 without a gene filter or than 12,000 genes when half of the genes are filtered. The reasons for this higher efficiency are mainly because of 1) the overhead of the schedule of the GPU kernels, 2) the overhead of the data transfer from the GPU memory to host memory, 3) and small GPU thread space created for a small dataset. More precisely, a GPU can provide a benefit only when the reduction in the computing time on the GPU exceeds the cost of the overhead. To improve the performance of the GPU, as many GPU threads should be created as possible to offset the overhead. For a small dataset, the computational complexity may not be large enough to offset the cost of the overhead. As the dataset increases, the computational workloads will increase and additional GPU threads will be created. Therefore, the GPU performance improves and eventually overcomes the multi-core CPU as shown in Fig. 1. For analyzing small datasets, we also provide the multi-core version in our source code packages. To achieve high performance on the GPU, FastGCN was designed to avoid branches, to exploit matrix compression and to store gene expression data in major columns to coalesce GPU global memory. The data fetched by one global memory access can serve as many threads as possible in a warp. The main purpose of this article is to emphasize that the entropy reduction method and the new GPU parallel computing technology can collaboratively contribute to gene co-expression analysis. Therefore, we hope the tool we developed will be wildly used in gene expression analyses.

GPU architectures implemented in FastGCN.

(a) GPU architecture of genetic information entropy computation. (b) GPU architecture of Pearson Correlation Coefficients computation. (c) GPU architecture of Z-score transformation. (d) GPU architecture of modules identification. (TIF) Click here for additional data file.

Optimization strategies in GPU implementation.

(a) mapping relationship among one-dimensional GPU thread space, two-dimensional input matrix space and one-dimensional coefficient vector space. (b) Summation reduction of the Pearson Correlation Coefficients in GPU. (TIF) Click here for additional data file.

30 in total

1. Entropy of the genetic information and evolution.

Authors: M Hasegawa; T A Yano
Journal: Orig Life Date: 1975 Jan-Apr

2. Statistical analysis of the cancer cell's molecular entropy using high-throughput data.

Authors: Wessel N van Wieringen; Aad W van der Vaart
Journal: Bioinformatics Date: 2010-12-20 Impact factor: 6.937

3. Fast network centrality analysis using GPUs.

Authors: Zhiao Shi; Bing Zhang
Journal: BMC Bioinformatics Date: 2011-05-12 Impact factor: 3.307

4. International network of cancer genome projects.

Authors: Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal: Nature Date: 2010-04-15 Impact factor: 49.962

5. Increased entropy of signal transduction in the cancer metastasis phenotype.

Authors: Andrew E Teschendorff; Simone Severini
Journal: BMC Syst Biol Date: 2010-07-30

Review 6. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

7. Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression.

Authors: Zhiao Shi; Catherine K Derow; Bing Zhang
Journal: BMC Syst Biol Date: 2010-05-27

8. Gene regulatory networks inference using a multi-GPU exhaustive search algorithm.

Authors: Fabrizio F Borelli; Raphael Y de Camargo; David C Martins; Luiz C S Rozante
Journal: BMC Bioinformatics Date: 2013-11-05 Impact factor: 3.169

9. Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types.

Authors: Yang Yang; Leng Han; Yuan Yuan; Jun Li; Nainan Hei; Han Liang
Journal: Nat Commun Date: 2014 Impact factor: 14.919

10. PLANEX: the plant co-expression database.

Authors: Won Cheol Yim; Yongbin Yu; Kitae Song; Cheol Seong Jang; Byung-Moo Lee
Journal: BMC Plant Biol Date: 2013-05-20 Impact factor: 4.215

11 in total

1. Gene Co-expression Network Analysis and Linking Modules to Phenotyping Response in Plants.

Authors: Qian Du; Malachy T Campbell; Huihui Yu; Kan Liu; Harkamal Walia; Qi Zhang; Chi Zhang
Journal: Methods Mol Biol Date: 2022

2. Identification of Genes Encoding Enzymes Catalyzing the Early Steps of Carrot Polyacetylene Biosynthesis.

Authors: Lucas Busta; Won Cheol Yim; Evan William LaBrant; Peng Wang; Lindsey Grimes; Kiah Malyszka; John C Cushman; Patricia Santos; Dylan K Kosma; Edgar B Cahoon
Journal: Plant Physiol Date: 2018-10-17 Impact factor: 8.340

3. A null model for Pearson coexpression networks.

Authors: Andrea Gobbi; Giuseppe Jurman
Journal: PLoS One Date: 2015-06-01 Impact factor: 3.240

4. Network-based analysis of the molecular mechanisms of multiple myeloma and monoclonal gammopathy of undetermined significance.

Authors: Zhi Liu; Jing Huang; Qi Zhong; Yanling She; Ruimin Ou; Cheng Li; Rui Chen; Mengdong Yao; Qing Zhang; Shuang Liu
Journal: Oncol Lett Date: 2017-08-04 Impact factor: 2.967

5. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study.

Authors: Stephen P Ficklin; Leland J Dunwoodie; William L Poehlman; Christopher Watson; Kimberly E Roche; F Alex Feltus
Journal: Sci Rep Date: 2017-08-17 Impact factor: 4.379

6. Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study.

Authors: Taban Eslami; Fahad Saeed
Journal: High Throughput Date: 2018-04-20

7. NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles.

Authors: Benafsh Husain; Allison R Hickman; Yuqing Hang; Benjamin T Shealy; Karan Sapra; F Alex Feltus
Journal: G3 (Bethesda) Date: 2020-09-02 Impact factor: 3.154