| Literature DB >> 28403906 |
Juan A Botía1,2, Jana Vandrovcova3, Paola Forabosco4, Sebastian Guelfi3, Karishma D'Sa5,3, John Hardy3, Cathryn M Lewis5, Mina Ryten5,3, Michael E Weale5.
Abstract
BACKGROUND: Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn ).Entities:
Keywords: Assessment of better gene clusters on bulk tissue; Gene co-expression networks on brain; K-means applied to WGCNA
Mesh:
Year: 2017 PMID: 28403906 PMCID: PMC5389000 DOI: 10.1186/s12918-017-0420-6
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Real names for the tissues used in the UKBEC and GTEx brain tissue experiments
| Short name | UKBEC Tissue name | Samples | Short name | GTEx Tissue name | Samples |
|---|---|---|---|---|---|
| CRBL | Cerebellum | 76 | AMYG | Amygdala | 72 |
| FCTX | Frontal Cortex | 83 | ACCT | Anterior cingulate cortex (BA24) | 84 |
| HIPP | Hippocampus | 86 | CAUD | Caudate (basal ganglia) | 117 |
| MEDU | Medulla | 88 | CEHE | Cerebellar Hemisphere | 105 |
| OCTX | Occipital cortex | 77 | CERE | Cerebellum | 125 |
| PUTM | Putamen | 77 | CTEX | Cortex | 114 |
| SNIG | Substantia nigra | 65 | FCTX | Frontal Cortex (BA9) | 108 |
| TCTX | Temporal Cortex | 72 | HIPP | Hippocampus | 94 |
| THAL | Thalamus | 81 | HYPO | Hyppothalamus | 96 |
| WHMT | White matter | 83 | NUAC | Nucleus accumbens (basal ganglia) | 113 |
| PUTM | Putamen | 97 | |||
| SPIN | Spinal Cord | 71 | |||
| SNIG | Substantianigra | 63 |
Fig. 1Upper plot shows the evolution of the number of moved genes (y axis) between any pair of modules p and p across k-means iterations (x axis) for UKBEC-microarray dataset. Bottom plot shows the average module membership of genes (y axis) moved (dashed line) across iterations (x axis) for the UKBEC-microarray dataset in comparison with average module membership for all the genes (solid line)
Fig. 2The within cluster distance evolution during the k-means runs for the UKBEC datasets
Fig. 3Euclidean distance of successive module eigengenes along the k-means iterations for Cerebellum samples for UKBEC datasets
Number of new modules from a tissue (rows) that are preserved on another tissue (columns) after applying the k-means to the standard WGCNA partitions
| (a) UKBEC brain tissues | |||||||||||||
| CRBL | FCTX | HIPP | MEDU | OCTX | PUTM | SNIG | TCTX | THAL | WHMT | ||||
| CRBL | 0 | 1 | 2 | 3 | 0 | 5 | 3 | 1 | 3 | 2 | |||
| FCTX | 5 | 0 | 1 | 5 | 0 | 6 | 3 | 0 | 5 | 3 | |||
| HIPP | 4 | 2 | 0 | 3 | 0 | 7 | 1 | 3 | 0 | 0 | |||
| MEDU | 1 | 2 | 4 | 0 | 3 | 2 | 3 | 2 | 1 | 1 | |||
| OCTX | 7 | 1 | 3 | 6 | 0 | 9 | 6 | 3 | 8 | 6 | |||
| PUTM | 3 | 1 | 3 | 2 | 1 | 0 | 1 | 1 | 0 | 2 | |||
| SNIG | 1 | 2 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | |||
| TCTX | 3 | 2 | 1 | 3 | 0 | 1 | 4 | 0 | 2 | 4 | |||
| THAL | 2 | 2 | 3 | 1 | 1 | 1 | 0 | 0 | 0 | -1 | |||
| WHMT | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | |||
| (b) GTEx brain tissues | |||||||||||||
| AMYG | ACCT | CAUD | CEHE | CERE | CTEX | FCTX | HIPP | HYPO | NUAC | PUTM | SPIN | SNIG | |
| AMYG | 0 | 0 | 14 | 1 | 2 | 5 | 6 | 3 | 10 | 14 | 7 | 7 | 7 |
| ACCT | 0 | 0 | 0 | 3 | 1 | 3 | -1 | 5 | 1 | 0 | 1 | 6 | 2 |
| CAUD | 2 | 3 | 0 | 2 | -1 | 5 | 6 | 5 | 6 | 3 | 1 | 2 | 4 |
| CEHE | 2 | 0 | 1 | 0 | 8 | 0 | 0 | 0 | 4 | 1 | 2 | 0 | 0 |
| CERE | 1 | 2 | 1 | 15 | 0 | 4 | 1 | 1 | 0 | 0 | 1 | 1 | 1 |
| CTEX | 2 | 4 | 0 | 4 | 3 | 0 | 1 | 2 | 4 | 4 | 4 | 6 | 3 |
| FCTX | 4 | 5 | 6 | 1 | 2 | 4 | 0 | 3 | 0 | 1 | 4 | 1 | 2 |
| HIPP | 0 | 4 | 7 | 2 | 4 | 0 | 8 | 0 | 1 | 2 | 2 | 8 | 0 |
| HYPO | 2 | 9 | 7 | 4 | 5 | 6 | 5 | 11 | 0 | 3 | 4 | 1 | 1 |
| NUAC | 12 | 11 | 7 | 9 | 5 | 9 | 7 | 20 | 9 | 0 | 6 | 4 | 7 |
| PUTM | 1 | -3 | 4 | 6 | 1 | 5 | 3 | 8 | 7 | 5 | 0 | 6 | 5 |
| SPIN | 4 | 2 | 1 | -1 | 1 | 6 | 2 | 7 | 9 | 5 | 14 | 0 | 4 |
| SNIG | 15 | 12 | 5 | 0 | 0 | 18 | 4 | 9 | 14 | 7 | 13 | 6 | 0 |
Fig. 4Results on performance of standard WGCNA and k-means on 42 simulated data sets that used the GTEx WGCNA GNCs as seed for simulation. We display the same results using three different indexes of similarity between cluster partitions. The k-means method outperforms standard WGCNA with all three indexes used
Fig. 5The left plot’s light blue blue bars show the percentage of relative improvement by k-means with respect to WGCNA S (P) statistic. Values in red (<0%) are those that k-means fails to improve. The right plot shows cell type enrichment improvement in the same way, for the 10 UKBEC GCNs and the 13 GTEx brain networks. Again, values in red are those that k-means fails to improve
Fig. 6Relation between frequency of appearance of GO annotation terms across all GTEx GCNs and IC (information content). Terms appearing more times tend to have lower IC. Regression lines show that k-means gets better IC values for highly repetitive terms (not significant Anova test)
Fig. 7Effect of random assignment of genes selected by k-means, on a WGCNA partition, to be changed from one module to another. Plot (a) refers to S values and (b) to number of significant terms