| Literature DB >> 35723358 |
Wen Chen1,2, Jing Li2, Shulan Huang1, Xiaodeng Li1, Xuan Zhang2, Xiang Hu1, Shuanglin Xiang1, Changning Liu2,3,4.
Abstract
Gene co-expression network analysis has been widely used in gene function annotation, especially for long noncoding RNAs (lncRNAs). However, there is a lack of effective cross-platform analysis tools. For biologists to easily build a gene co-expression network and to predict gene function, we developed GCEN, a cross-platform command-line toolkit developed with C++. It is an efficient and easy-to-use solution that will allow everyone to perform gene co-expression network analysis without the requirement of sophisticated programming skills, especially in cases of RNA-Seq research and lncRNAs function annotation. Because of its modular design, GCEN can be easily integrated into other pipelines.Entities:
Keywords: KEGG; RNA-Seq; gene co-expression network; gene ontology; lncRNAs annotation
Year: 2022 PMID: 35723358 PMCID: PMC9164028 DOI: 10.3390/cimb44040100
Source DB: PubMed Journal: Curr Issues Mol Biol ISSN: 1467-3037 Impact factor: 2.976
Figure 1The recommended pipeline of GCEN. The recommended pipeline consists of four parts: data pretreatment, network construction, module identification, and function annotation.
Figure 2Gene function annotation methods. (a) Network-based function annotation. The function of an unknown gene (red box) is inferred from the function of its neighbours (green background red circle) in the network. (b) Module-based function annotation. The function of an unknown gene (red box) is inferred from the function of other genes (green background red circle) in the module. (c) Random walk with restart (RWR). Information flows in the network from the seed node (dark green) until convergence. Each node has traces of information. The rank of the node (light green) highly associated with the seed node is higher.
Time and memory consumption tests for network construction.
| Gene Number | GCEN | FastGCN | WGCNA |
|---|---|---|---|
| 10k | 9.51 s/5.93 MiB | 16.98 s/1.31 GiB | 59.36 s/1.84 GiB |
| 20k | 37.86 s/8.50 MiB | 2 m 11.59 s/5.25 GiB | 3 m 47.15 s/6.36 GiB |
| 40k | 2 m 31.42 s/12.88 MiB | 24 m 23.33 s/21.12 GiB | 14 m 57.86 s/24.39 GiB |
| 80k | 10 m 7.70 s/21.58 MiB | Out of maximum memory | 59 m 53.82 s/96.11 GiB |
These tests were run on a personal computer with an Intel Core i5-10400 processor (6 cores/12 threads) and 128GB memory. The version of GCEN was 0.5.1, the version of FastGCN is v1.1, and the version of WGCNA was 1.69. The test data were a randomly generated number between 0 and 1, and each gene has 20 expression values. All tests were run in single thread mode, although GCEN has implemented multi-threading. The test data and scripts can be found on our website (https://www.biochen.com/gcen/static/benchmark.zip, accessed on 13 March 2022).
Figure 3Examples of data visualization. (a) Network degree distribution. (b) Sub-network or module. (c) GO annotation (biological process top 10). The dataset derived from an RNA-Seq of 17 samples of zebrafish [21].