| Literature DB >> 35701465 |
Yeong Jun Koh1, Seung Hwan Lee2, Hyo-Jun Lee3, Yoonji Chung4, Ki Yong Chung5, Young-Kuk Kim6, Jun Heon Lee4.
Abstract
In the general framework of the weighted gene co-expression network analysis (WGCNA), a hierarchical clustering algorithm is commonly used to module definition. However, hierarchical clustering depends strongly on the topological overlap measure. In other words, this algorithm may assign two genes with low topological overlap to different modules even though their expression patterns are similar. Here, a novel gene module clustering algorithm for WGCNA is proposed. We develop a gene module clustering network (gmcNet), which simultaneously addresses single-level expression and topological overlap measure. The proposed gmcNet includes a "co-expression pattern recognizer" (CEPR) and "module classifier". The CEPR incorporates expression features of single genes into the topological features of co-expressed ones. Given this CEPR-embedded feature, the module classifier computes module assignment probabilities. We validated gmcNet performance using 4,976 genes from 20 native Korean cattle. We observed that the CEPR generates more robust features than single-level expression or topological overlap measure. Given the CEPR-embedded feature, gmcNet achieved the best performance in terms of modularity (0.261) and the differentially expressed signal (27.739) compared with other clustering methods tested. Furthermore, gmcNet detected some interesting biological functionalities for carcass weight, backfat thickness, intramuscular fat, and beef tenderness of Korean native cattle. Therefore, gmcNet is a useful framework for WGCNA module clustering.Entities:
Mesh:
Year: 2022 PMID: 35701465 PMCID: PMC9197844 DOI: 10.1038/s41598-022-13796-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Module clustering results. The upper panel displays the hierarchical clustering dendrogram. In the lower panel, the colors show the module memberships determined by the methods on the left.
Model performance on Korean native cattle dataset in terms of graph modularity and DEM signaling.
| Method | HC | K-means | K-medoids | gmcNet |
|---|---|---|---|---|
| 0.219 | 0.138 | 0.171 | 0.261 | |
| DEM-signal | 18.618 | 22.723 | 18.236 | 27.739 |
Figure 2First and second principal components of three feature type of Korean native cattle dataset and clustering results of each method. The x-axis and y-axis are first and second principal component. The colors show the module memberships determined by the methods on the top.
Figure 3Optimal k searching considering DEM signaling and the modularity .
Figure 4The DEM signals of modules defined by gmcNet. The y-axis shows the module names and numbers of genes within each module. The x-axis shows the complex traits. The numbers in each cell are regression coefficients (no parentheses) and the regression (in parentheses). Red and blue indicate negative and positive coefficients, respectively. *, **.
Figure 5The biological processes of three significant modules: (a) K1, (b) K5, and (c) K7. p.adjust is a p-value adjusted by the Bonferroni method.
Figure 6Hub gene networks of the four principal modules of native Korean cattle: (a) K1, (b) K2, (c) K4, (d) K7. From the outside in, the top 200, top 25, and top 5 hub genes are shown. The linkages of the top 5 hub genes are shown as the edges of the networks.
Hub genes and associated traits of the main modules.
| Module | Hub gene | Significant trait | Reported cattle traits affected |
|---|---|---|---|
| K1 | CWT** (0.005) | Growth[ | |
| K2 | BF* (0.023); IMF** (0.0001) | Fat[ | |
| K4 | BF** (0.0002); IMF** (0.0001) | Fat[ | |
| K7 | WBSF** (0.008) | Fat[ |
Hub gene: Top 25 hub genes; Significant trait: Significant traits revealed by DEM analysis (p-values). The reported traits affected by each hub gene are listed in S4 Table.
Model performance on GEO dataset in terms of graph modularity and DEM signaling.
| Method | HC | K-means | K-medoids | gmcNet | |
|---|---|---|---|---|---|
| Human | 0.255 | 0.155 | 0.231 | ||
| DEM-signal | 21.620 | 24.975 | 22.082 | ||
| Mouse | 0.174 | 0.088 | 0.146 | ||
| DEM-signal | 66.505 | 74.591 | 65.494 | ||
| Pig | 0.122 | 0.177 | 0.132 | ||
| DEM-signal | 14.11 | 20.388 | 14.182 | ||
| Chicken | 0.328 | 0.334 | 0.265 | ||
| DEM-signal | 25.95 | 31.709 | 23.65 |
Significant values are in [bold].
Figure 7Construction of three topological overlap matrices. is the topological overlap matrix of all relationships. and are the topological overlap matrices of positive and negative relationships respectively.
Figure 8The architecture of gmcNet. is the single-level expression of n genes in m samples. is CEPR-embedded feature with dimension. is assignment probability matrix of n genes to k modules. is loss function.