| Literature DB >> 29226034 |
Yan Chen1, Yining Liu2, Min Du3, Wengang Zhang1, Ling Xu1, Xue Gao1, Lupei Zhang1, Huijiang Gao1, Lingyang Xu1, Junya Li1, Min Zhao4.
Abstract
Integrating genomic information into cattle breeding is an important approach to exploring genotype-phenotype relationships for complex traits related to diary and meat production. To assist with genomic-based selection, a reference map of interactome is needed to fully understand and identify the functional relevant genes. To this end, we constructed a co-expression analysis of 92 tissues and this represents the systematic exploration of gene-gene relationship in Bos taurus. By using robust WGCNA (Weighted Gene Correlation Network Analysis), we described the gene co-expression network of 5,000 protein-coding genes with majority variations in expression across 92 tissues. Further module identifications found 55 highly organized functional clusters representing diverse cellular activities. To demonstrate the re-use of our interaction for functional genomics analysis, we extracted a sub-network associated with DNA binding genes in Bos taurus. The subnetwork was enriched within regulation of transcription from RNA polymerase II promoter representing central cellular functions. In addition, we identified 28 novel linker genes associated with more than 100 DNA binding genes. Our WGCNA-based co-expression network reconstruction will be a valuable resource for exploring the molecular mechanisms of incompletely characterized proteins and for elucidating larger-scale patterns of functional modulization in the Bos taurus genome.Entities:
Keywords: Bos taurus; Co-expression; Functional enrichment; Network; Systems biology; WGCNA
Year: 2017 PMID: 29226034 PMCID: PMC5719962 DOI: 10.7717/peerj.4107
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Determination of power Beta value based on the adjacency matrix using the weighted gene correlation network analysis (WGCNA).
The adjacency matrix from co-expression data was weighted by the power of the correlation data between different genes; i.e., a = |S|β. The weighted parameter power Beta value was determined by the scale-free topology criterion. To ensure that the average connectivity of the network is smooth, we chose β = 4 based on both chart: (A) for topology fitting results and (B) for mean connectivity.
Figure 2The WGCNA analysis on the top 5,000 genes with most variation across 92 tissues in Bos taurus.
(A) Functional modules are illustrated with different colours. The parameter deepslip = 4 is set in WGCNA analysis, which providing a high sensitivity to cluster splitting. We additionally required each gene module with 30 or more genes. In total, 4,950 genes were grouped into 56 modules which showed with various colours. The top five modules ordered by number of genes were: turquoise with 212 genes; blue with 201 genes; brown with 187 genes; yellow with 162 genes; and green with 155 genes. The grey colour in the left of the figure represents the 50 genes not associated with any module. (B) The relationship tree for all the modules is presented and the top five modules marked in the corresponding number.
The enriched KEGG pathways for the genes in module 1 and 2 from WGCNA analysis.
| Pathway | # of genes | |
|---|---|---|
| Metabolism | 43 | 6.26E−07 |
| Isoleucine degradation | 3 | 0.04218 |
| Collagen formation | 14 | 4.54E−12 |
| Extracellular matrix organization | 21 | 4.92E−12 |
| Collagen biosynthesis and modifying enzymes | 13 | 1.54E−11 |
| ECM proteoglycans | 11 | 2.85E−10 |
| Collagen degradation | 10 | 7.38E−09 |
| Assembly of collagen fibrils and other multimeric structures | 9 | 3.76E−08 |
| Degradation of the extracellular matrix | 12 | 5.32E−08 |
| Integrin cell surface interactions | 11 | 2.07E−07 |
| NCAM1 interactions | 6 | 8.55E−06 |
| Glycosaminoglycan metabolism | 9 | 0.00409 |
| MET activates PTK2 signaling | 5 | 0.00967 |
| Cooperation of PDCL (PhLP1) and TRiC/CCT in G-protein beta folding | 5 | 0.01919 |
| Non-integrin membrane-ECM interactions | 5 | 0.02361 |
| Axon guidance | 14 | 0.04552 |
Notes.
* Q-values: the raw P-values of the hypergeometric test were corrected by Benjamini–Hochberg multiple testing correction.
The enriched biological processes GO terms for the genes in the top five modules from WGCNA analysis.
| Modules | GO: biological process | |
|---|---|---|
| M1 | Small molecule metabolic process | 0.000971 |
| M1 | Carboxylic acid metabolic process | 0.00332 |
| M1 | Oxoacid metabolic process | 0.003628 |
| M1 | Organic acid metabolic process | 0.005205 |
| M1 | Single-organism metabolic process | 0.041382 |
| M2 | Extracellular matrix organization | 0.000392 |
| M2 | Extracellular structure organization | 0.000427 |
| M2 | Protein heterotrimerization | 0.000438 |
| M2 | Collagen fibril organization | 0.000636 |
| M2 | Protein trimerization | 0.004188 |
| M3 | Cell projection organization | 0.013119 |
| M3 | Microtubule cytoskeleton organization | 0.028215 |
| M3 | Microtubule-based process | 0.045173 |
| M3 | Nervous system development | 0.04747 |
| M4 | Pigment cell differentiation | 0.006709 |
| M4 | Regulation of pigment cell differentiation | 0.008956 |
| M4 | Developmental pigmentation | 0.024965 |
| M4 | Melanocyte differentiation | 0.026407 |
| M5 | Sertoli cell development | 0.00372 |
Notes.
* Q-values: the raw P-values of the hypergeometric test were corrected by Benjamini–Hochberg multiple testing correction.
Figure 3The co-expression network and gene ontology analysis of 340 genes with 100 or more connections.
(A) Co-expression network from WGCNA based on the TOM greater than 0.1; (B) the distribution of the number of connections for all the nodes in the network; and (C) short path length frequency for the network. The scatterplot (D) shows the gene ontology (GO) cluster representatives for the 340 genes in a two-dimensional space derived by applying multidimensional scaling to a matrix of the GO terms semantic similarities. Bubble colour indicates the corrected P-value of the GO term.
Figure 4The sub-network for the DNA binding genes in Bos taurus.
(A) the sub-network extracted for DNA binding genes; (B) the distribution of the number of connections for all the nodes in the network; (C) the short path length frequency for the network.