| Literature DB >> 27588126 |
Juan Chen1, Hai-Tao Yang2, Zhu Li3, Ning Xu4, Bo Yu2, Jun-Ping Xu2, Pei-Ge Zhao2, Yan Wang2, Xiu-Juan Zhang2, Dian-Jie Lin5.
Abstract
Studies that only assess differentially-expressed (DE) genes do not contain the information required to investigate the mechanisms of diseases. A complete knowledge of all the direct and indirect interactions between proteins may act as a significant benchmark in the process of forming a comprehensive description of cellular mechanisms and functions. The results of protein interaction network studies are often inconsistent and are based on various methods. In the present study, a combined network was constructed using selected gene pairs, following the conversion and combination of the scores of gene pairs that were obtained across multiple approaches by a novel algorithm. Samples from patients with and without lung adenocarcinoma were compared, and the RankProd package was used to identify DE genes. The empirical Bayesian (EB) meta-analysis approach, the search tool for the retrieval of interacting genes/proteins database (STRING), the weighted gene coexpression network analysis (WGCNA) package and the differentially-coexpressed genes and links package (DCGL) were used for network construction. A combined network was also constructed with a novel rank-based algorithm using a combined score. The topological features of the 5 networks were analyzed and compared. A total of 941 DE genes were screened. The topological analysis indicated that the gene interaction network constructed using the WGCNA method was more likely to produce a small-world property, which has a small average shortest path length and a large clustering coefficient, whereas the combined network was confirmed to be a scale-free network. Gene pairs that were identified using the novel combined method were mostly enriched in the cell cycle and p53 signaling pathway. The present study provided a novel perspective to the network-based analysis. Each method has advantages and disadvantages. Compared with single methods, the combined algorithm used in the present study may provide a novel method to analyze gene interactions, with increased credibility.Entities:
Keywords: empirical Bayesian; lung adenocarcinomas; protein interaction network; topological analysis; weighted gene coexpression network analysis
Year: 2016 PMID: 27588126 PMCID: PMC4998145 DOI: 10.3892/ol.2016.4822
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Characteristics of the individual studies included in the present study.
| First author | Year | Access no. | Sample size, total (cases/controls) | Platform | Gene size, bases | Ref. |
|---|---|---|---|---|---|---|
| Shiraishi | 2010 | E-GEOD-10072 | 107 (58/49) | Affymetrix HG-U133A | 12,493 | ( |
| Hou | 2010 | E-GEOD-19188 | 110 (45/65) | Affymetrix HG-U133Plus2 | 20,109 | ( |
| Okayama | 2012 | E-GEOD-31210 | 246 (226/20) | Affymetrix HG-U133Plus2 | 20,109 | ( |
| Yamauchi | ( | |||||
| Yap | 2005 | E-MEXP-231 | 58 (49/9) | Affymetrix HG-U133A | 12,493 | ( |
Figure 1.Graphical representation of the topological structures of the gene interaction networks constructed by 4 existing methods. Genes were denoted as nodes, and interactions between gene pairs were presented as edges (lines) in the images. (A) Network identified by empirical Bayesian method. (B) Network based on search tool for the retrieval of interacting genes/proteins database. (C) Coeexpression network constructed using the differentially-coexpressed genes and links approach. (D) Network based on weighted gene co-expression network analysis.
Figure 2.Combined gene interaction network based on the novel scores of each gene pairs across 4 methods. Genes were denoted as nodes and interactions between gene pairs were presented as edges (lines) in the image. A total of 280 nodes and 515 edges composed the combined network.
Figure 3.Scatter-gram of gene degree in the combined network. The combined network is a scale-free network of which the degree distribution followed a power law (y = axb, where a=121.0, b=−1.315) with the highest fitting coefficient (R2=0.977).
Parameters of 5 networks constructed using 4 existing approaches and a novel algorithm.
| Characteristic | EB | STRING | DCGL | WGCNA | Combination |
|---|---|---|---|---|---|
| Nodes | 703.000 | 419.000 | 537.000 | 79.000 | 280.000 |
| Edges | 2,064.000 | 3,734.000 | 6379.000 | 649.000 | 515.000 |
| R2 | 0.963 | 0.931 | 0.938 | 0.264 | 0.977 |
| Clustering coefficient | 0.024 | 0.453 | 0.118 | 0.813 | 0.211 |
| Mean shortest path length | 3.673 | 5.337 | 2.715 | 1.783 | 4.195 |
EB, empirical Bayesian; STRING, search tool for the retrieval of interacting genes/proteins; DCGL, differentially-coexpressed genes and links; WGCNA, weighted gene coexpression network analysis.
Enriched Kyoto encyclopedia of genes and genomes pathways of gene pairs identified by 4 existing methods and a novel algorithm.
| Number of gene pairs | |||||||
|---|---|---|---|---|---|---|---|
| Pathway | Category | P-value | EB | STRING | DCGL | WGCNA | Combination |
| ECM-receptor interaction | hsa04512 | 0.000098 | 0 | 36 | 3 | 0 | 1 |
| Cell adhesion molecules | hsa04514 | 0.000991 | 1 | 5 | 1 | 0 | 0 |
| p53 signaling pathway | hsa04115 | 0.001466 | 1 | 21 | 1 | 0 | 4 |
| Focal adhesion | hsa04510 | 0.001510 | 1 | 38 | 3 | 0 | 2 |
| Vascular smooth muscle contraction | hsa04270 | 0.002649 | 0 | 7 | 1 | 0 | 1 |
| Cell cycle | hsa04110 | 0.003350 | 1 | 95 | 8 | 2 | 10 |
| Complement and coagulation cascades | hsa04610 | 0.005190 | 0 | 3 | 0 | 0 | 0 |
ECM, extracellular matrix; EB, empirical Bayesian; STRING, search tool for the retrieval of interacting genes/proteins; DCGL, differentially-coexpressed genes and links; WGCNA, weighted gene coexpression network analysis.