Literature DB >> 27588126

Construction of protein interaction network involved in lung adenocarcinomas using a novel algorithm.

Juan Chen¹, Hai-Tao Yang², Zhu Li³, Ning Xu⁴, Bo Yu², Jun-Ping Xu², Pei-Ge Zhao², Yan Wang², Xiu-Juan Zhang², Dian-Jie Lin⁵.

Abstract

Studies that only assess differentially-expressed (DE) genes do not contain the information required to investigate the mechanisms of diseases. A complete knowledge of all the direct and indirect interactions between proteins may act as a significant benchmark in the process of forming a comprehensive description of cellular mechanisms and functions. The results of protein interaction network studies are often inconsistent and are based on various methods. In the present study, a combined network was constructed using selected gene pairs, following the conversion and combination of the scores of gene pairs that were obtained across multiple approaches by a novel algorithm. Samples from patients with and without lung adenocarcinoma were compared, and the RankProd package was used to identify DE genes. The empirical Bayesian (EB) meta-analysis approach, the search tool for the retrieval of interacting genes/proteins database (STRING), the weighted gene coexpression network analysis (WGCNA) package and the differentially-coexpressed genes and links package (DCGL) were used for network construction. A combined network was also constructed with a novel rank-based algorithm using a combined score. The topological features of the 5 networks were analyzed and compared. A total of 941 DE genes were screened. The topological analysis indicated that the gene interaction network constructed using the WGCNA method was more likely to produce a small-world property, which has a small average shortest path length and a large clustering coefficient, whereas the combined network was confirmed to be a scale-free network. Gene pairs that were identified using the novel combined method were mostly enriched in the cell cycle and p53 signaling pathway. The present study provided a novel perspective to the network-based analysis. Each method has advantages and disadvantages. Compared with single methods, the combined algorithm used in the present study may provide a novel method to analyze gene interactions, with increased credibility.

Entities: Chemical Disease Gene Species

Keywords: empirical Bayesian; lung adenocarcinomas; protein interaction network; topological analysis; weighted gene coexpression network analysis

Year: 2016 PMID： 27588126 PMCID： PMC4998145 DOI： 10.3892/ol.2016.4822

Source DB: PubMed Journal: Oncol Lett ISSN： 1792-1074 Impact factor: 2.967

Introduction

Lung cancer is the main cause of cancer-associated mortality, and annually results in >1 million mortalities globally (1). Lung adenocarcinomas (ADCs) constitute a biologically heterogeneous group of lung tumors, and are, at present, the most common type of lung cancer (2). Previous studies have reported that gene expression profiling can be used to divide lung ADC into several subgroups and to distinguish primary cancers from metastases of extrapulmonary origin. Lung ADCs show striking variation in expression patterns compared with squamous cell lung carcinomas or small cell lung carcinomas (3). A method that is often used to investigate the histopathology of a disease is the study of microarray data to identify genetic signatures. The identification of genes that are differentially expressed (DE) across two types of tissue samples or samples obtained under two experimental conditions is a typical task in the analysis of microarray data (4). RankProd is a method often used for detecting DE genes in replicated microarray experiments (5). RankProd is a non-parametric statistical method derived from biological reasoning that detects items that are consistently highly ranked in a number of lists (6). The method confers a number of advantages over linear modeling, including the biological intuition of fold-change (FC) criterion, fewer assumptions under the model, and increased performance with noisy data or low numbers of replicates (7). However, the method does not accommodate for other types of differential regulation, including differential coexpression (DC). Therefore, the empirical Bayesian (EB) approach was introduced. The EB method provides a false discovery rate (FDR) controlled list of significant pairs and pair-specific posterior probabilities that may be used in the identification of particular DC types (8). EB may also be used for the model-based inference of cellular signaling networks (9). A necessary requirement for any systems-level understanding of cellular functions is the correct identification and annotation of all functional interactions among cell proteins (10). Functional links between proteins may often be inferred from genomic associations between their encoding genes (11). The search tool for the retrieval of interacting genes/proteins (STRING) database is a precomputed global resource for the investigation and analysis of protein associations (12). The database provides uniquely comprehensive coverage and ease of access to experimental and predicted interaction information. Interactions in STRING are provided with a confidence score and accessory information, including protein domains and 3 dimensional structures, is made available within a stable and consistent identifier space (10). In addition, correlation networks are increasingly being used in bioinformatics applications. The weighted gene coexpression network analysis (WGCNA) package is a comprehensive collection of R functions designed to perform various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization and interfacing with external software (13). WGCNA has been used to identify the endometrial cancer prognosis markers (14). In addition, from the perspective of systems biology, gene coexpression analysis is useful for investigating gene interconnection at the expression level. The differentially-coexpressed genes and links (DCGL) R package may be used to identify DCGs and links from gene expression microarray data (15). A comparison between cellular networks may provide insight into biological understanding and therapeutics. However, the comparison between large networks is infeasible; therefore, heuristic methods, including the degree distribution, clustering coefficient, diameter and relative graphlet frequency distribution, were used (16). The analysis of network topological features may elucidate the complex cellular mechanisms and processes and provide insight into the evolutionary aspects of the proteins involved in (17). Previously, the topological analysis on mass-balanced signaling networks has been performed and used as a framework to obtain network properties, including crosstalk (18). Similar to numerous other biological and real-world networks, protein interaction networks also exhibit the established small-world phenomenon (19) and scale-free property (20). The small-world network, which has a small average shortest path length and a large clustering coefficient, may enable a rapid integration of information (21). The scale-free network, of which the node degree distribution follows a power law, is characterized by a small number of highly connected nodes, whereas the majority of nodes interact with only a few neighbors. The network also demonstrates an increased robustness to endure random failure. In the present study, samples from patients with and without lung ADC were compared in order to find novel molecular targets for lung ADC treatment. First, the RankProd package was used to identify DE genes. Next, the EB coexpression meta-analysis, STRING approach, WGCNA package and DCGL package were used for gene interaction network construction. Each method has various advantages and weaknesses. In order to take the non-uniform outcomes from various approaches into consideration, a novel algorithm was applied to combine 4 existing methods to identify gene pairs and networks in the present study. The topological features of the 5 networks, including clustering coefficient, average shortest path length and degree distribution, were compared and analyzed. The present study may increase the future understanding of gene interactions, increase the credibility of current methods and be important for the understanding of the molecular mechanisms of lung ADC.

Materials and methods

Data collection and preprocessing

The microarray expression profiles of patients with and without lung ADC were downloaded from ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) under the E-GEOD-10072 (22), E-GEOD-19188 (23), E-GEOD-31210 (24,25) and E-MEXP-231 (26) access numbers. In all datasets, only lung ADC and non-lung ADC control patient data were retained. The sample characteristics, platform and gene expression data were also extracted from each dataset and the associated study. The characteristics of the studies are shown in Table I.

Table I.

Characteristics of the individual studies included in the present study.

First author	Year	Access no.	Sample size, total (cases/controls)	Platform	Gene size, bases	Ref.
Shiraishi et al	2010	E-GEOD-10072	107 (58/49)	Affymetrix HG-U133A	12,493	(22)
Hou et al	2010	E-GEOD-19188	110 (45/65)	Affymetrix HG-U133Plus2	20,109	(23)
Okayama et al and	2012	E-GEOD-31210	246 (226/20)	Affymetrix HG-U133Plus2	20,109	(24)
Yamauchi et al						(25)
Yap et al	2005	E-MEXP-231	58 (49/9)	Affymetrix HG-U133A	12,493	(26)

Prior to analysis, the original expression data from all conditions were subjected to data preprocessing. The probe-level data in the CEL files were converted into expression measures. Gene probes from each dataset were acquired and read by the affy package (bioconductor.org/packages/affy). A background correction was performed using the robust multiarray average algorithm (27) to eliminate non-specific hybridization. Data normalization was conducted using quantiles (28). The modification of perfect match/mismatch values was performed using the Micro Array Suite 5.0 algorithm (29). The expression value was aggregated by the median polish summarization method (30). The featureFilter function in the GeneFilter package (bioconductor.org/packages/genefilter) was used to filter data and for probe annotation. The GetSYMBOL function of the annotate package (bioconductor.org/packages/annotate) was used to map the association between the probes and gene symbols (31). All preprocessing was performed using the espresso function of the Limma package (bioconductor.org/packages/limma) (32,33). The average values of the gene symbols with multiple probes were obtained.

Identification of DE genes

The RankProd approach (6,34) was employed to identify the DE genes associated with lung ADC. The software RankProd is implemented in the statistical programming language R as a package of the open-resource Bioconductor project (35). The microarray expression data were combined to detect DE genes using the RPadvance function in the RankProd package. The P-values for all genes were converted into the form -log2. Only the genes with a percentage of false-positives (PFP) value of ≤0.05 were considered to be DE between treatments and controls.

Construction of protein interaction networks for DE genes

Identification of DC using the EB approach

At present, numerous approaches have been used to identify DC gene pairs; however, the gene pairs are often prone to false identification under the conditions of large cardinality of the space to be interrogated (36). Therefore, the EB method was applied, which provided an FDR controlled list of significant pairs and pair-specific posterior probabilities (8). To achieve this, the EBcoexpress package in R was employed to conduct the differential co-expression analysis (37). The EB approach is applicable within a single study and across multiple studies. In the single study analysis, 3 inputs were required, including X, the array conditions and the pattern object (37). For X, an m-by-n matrix of expression values was used, where m is the number of genes or probes under consideration and n is the total number of microarrays over all conditions. The values were normalized using background normalization and median correction methods to give all the arrays equal median expression. Generally, gene expression levels are transformed on a log2 scale. For the array conditions, the members of an array with length n were provided values ‘1,…, K’, where K is the total number of conditions. All microarrays and assays were placed in the same order as the n columns of X. An object EBarrays Pattern was used to define the equivalent coexpression/DC classes. Next, the function makeMyD() of biweight midcorrelation was used to the calculate intra-group associations for all p = m(m − 1) / 2 gene pairs. The initializeHP() function of the Mclust algorithm was used to identify the component normal mixture model that best fits the correlations of D. The Mclust algorithm may identify the normal mixture that best fits the empirical distribution of correlations, including component means, standard deviations and weights. These values played a role in initializing the expectation-maximization (EM) algorithm. In total, 3 functions accounted for the various versions of the modified EM approach, including the zero-step, one-step and full versions. The full version runs a complete two-cycle alternating expectation-conditional maximization. The zero-step version uses the initial estimates of the hyperparameters to generate posterior probabilities of DC. Subsequent to using the aforementioned algorithms, the priorDiagnostic() function was used to check the prior distribution selected by the EM. Finally, the crit.fun() function was used to provide a soft threshold and simulations to identify the DC gene pairs. DC genes were distinguished from gene pairs with invariant expression by controlling the posterior expected FDR at 0.05, and the coexpression network was constructed to account for the correlation between each pair of genes in the study. The curve was fit to the node degree distribution of the network.

Protein interactions obtained from STRING database

At present, protein or gene interactions and associations are annotated at various levels of detail that range between raw data repositories and highly formalized pathway databases in online resources. STRING aims to simplify access to information by providing a comprehensive, yet quality-controlled collection of protein-protein associations for a large number of organisms with a global perspective. The majority of the available information on protein or gene associations may be aggregated, scored and weighted with known and predicted interactions. Therefore, protein interactions across diverse experimental conditions may be measured and used as a predictor of functional associations in STRING, as in the present study. STRING employs 2 strategies to transfer known and predicted associations between organisms (11). Subsequent to the assignment of association scores and transfer between species, a combined score between any pair of proteins was computed, which increased confidence levels with an increased score compared with the individual sub-scores. The combined score accounted for the predicted and known scores obtained for each protein interaction from the STRING database, and was calculated according to the following formula: where S is the score for the interaction between proteins A and B, and S is the score normalized by the biggest value calculated for the method i. A graphical protein-protein interaction (PPI) network was then constructed and the topological features of the network were analyzed.

Identification of weighted correlation network

Correlation networks are increasingly being used in bioinformatics applications, and WGCNA has been used to describe the correlation patterns among genes across microarray samples (38). WGCNA may be used to identify clusters or modules of highly associated genes, to summarize clusters using the module eigengene or an intramodular hub gene, to associate modules with one other and with external sample traits using eigengene network methodology, and to calculate module membership measures (13). The WGCNA R package may be used to compute a gene selection score, termed ‘p.weighted’, based on the significance of the gene and module membership. The smaller the p.weighted value, the stronger the proof that the gene is a disease-associated hub gene. In the present study, the threshold value p.weighted score was set at ≤0.55. The weighted coexpression was determined by calculating a correlation matrix that contained all pairwise Pearson correlations between all probe sets spanning all subjects. The network nodes corresponded to gene expression and the edges between genes were determined by the pairwise Pearson correlation between gene expression. Subsequent to raising the absolute value of the Pearson correlation to a power β≥1 (soft thresholding), the weighted gene coexpression network construction emphasized the stronger correlations. The adjacency of an unsigned weighted gene coexpression network was calculated by a = |cor(x)|β. The soft threshold β=6 was chosen using the scale-free topology criterion (39). The positive and negative correlations of the network were treated equally and provided a value between 0–1. Following the selection of the weighted correlation networks, the topological features of the network were analyzed.

Identification of DC network

In order to identify DCGs and differentially-coexpressed links (DCLs) from gene expression microarray data (40), the DCGL 2.0 package in R program was introduced (15,41). In the process, the DCp and DCe functions were used to extract DCGs and DCLs. DCp and DCe are involved in the DC analysis module of the DCGL package (40). DCp plays a role in filtering sets of gene coexpression value pairs. Each pair is composed of 2 coexpression values that are calculated under 2 varying conditions, separately. The subset of the pairs was written as 2 vectors, X and Y, where n is coexpression neighbors for a gene. The DC of the gene was defined with the following equation: The novel Pearson correlation coefficient (PCC) was calculated and gene pairs were filtered based on the novel PCC with a q-value of 0.05. The DCe function may also be used to identify DCGs and DCLs, which are based on the limit fold-change (LFC) model. The correlation pairs were divided into 3 parts, according to the pairing of signs of coexpression values and the multitude of coexpression values, as follows: Pairs with same signs (N); pairs with differing signs (N); and pairs with differently-signed high coexpression values (N). N and N were processed with the LFC model separately to produce 2 subsets of DCLs (K). N was added to the set of DCLs directly. For a gene (g), the total number of links (n) and DCLs in particular (k) associated with it were counted. The DC of gene i measured using the DCe method was expressed by the following equation: In the process, gene pairs with a correlation value of ≥0.65 were considered to be significantly co-expressed (15,42). Finally, DCGs were mapped into Cytoscape software (www.cytoscape.org) for construction of the coexpression network, and topological features of the network were analyzed.

Conversion and combination of the gene association scores of the 4 methods

The score of each pair was obtained following the analysis of gene interactions using the aforementioned methods. Considering that variation in the results was obtained by the varying approaches, all the scores were analyzed in order to maintain a uniform standard. Therefore, a novel algorithm was applied to convert the scores of all the gene pairs in the present study. The conversion equation was as follows: where S was the combined score of each gene pair with integrated multiple results, n was the number of methods (n=4 in the present study), M was the number of gene pairs of the DE genes and N was the rank of a pair of genes. A novel score of each gene pair was obtained by calculating the mean. The mean was obtained by dividing the combined score by the number of methods. Next, gene pairs were ranked based on the novel scores, and the pairs that satisfied the criteria N/M ≤10% or −2logN/M ≥6.643856 were selected. The combined gene interaction network of the selected gene pairs was then constructed and the topological features of the network were analyzed. Topological analysis. The clustering coefficient and short average path length of the aforementioned 5 networks were obtained and compared to investigate whether the networks constructed from the 5 methods exhibited the small-world network properties. In addition, the fit of the R2 coefficient of the power-law y = ax of the 5 networks was also compared, as PPI networks in general are modular and scale-free, which meant that the networks had power-law (or scale-free) degree distributions (28,43). Network Analyzer 2.7 plugin in Cytoscape 3.1.0 was used for the evaluation of topological parameters.

Functional enrichment analysis

Highly connected gene pairs generally participate in similar biological processes and pathways. In order to investigate the biological functional enrichment of the identified gene pairs, a pathway enrichment analysis was performed, based on the Kyoto encyclopedia of genes and genomes (KEGG; www.genome.jp/kegg/). The DE genes identified by RankProd were first imported to the online database for annotation, visualization and integrated discovery (http://david.abcc.ncifcrf.gov/tools.jsp), and all the pathways that the DE genes enriched were obtained. Next, with the DE genes in each pathway as a background, the number of enriched gene pairs identified by the 4 existing methods and the combined approach were calculated and compared. The terms with P<0.01 were considered to indicate a significant difference.

Results

Following the normalization and preprocessing of the expression profile datasets, a total of 12,493 genes in E-GEOD-10072, 20,109 genes in E-GEOD-19188, 20,109 genes in E-GEOD-31210 and 12,493 genes in E-MEXP-231 were obtained. Of those genes, 12,493 were common. By applying the RankProd package for meta-analysis, a total of 941 DE genes, 386 upregulated and 555 downregulated, were considered to be DE, with a PFP value of ≤0.05 and FC value of >2.

Topological analysis of 5 protein interaction networks

The protein interaction networks of DE genes were constructed using EB, STRING, DCGL and WGCNA (Fig. 1), and the association between gene pairs was determined. Subsequently, a novel algorithm was implemented to combine the score values of all gene pairs obtained from the 4 existing approaches. A novel matrix with a combined score of each gene pair was produced and a simple rank-based permutation procedure was used. Next, the combined gene interaction network was also constructed, consisting of 280 nodes and 515 edges (Fig. 2).

Figure 1.

Graphical representation of the topological structures of the gene interaction networks constructed by 4 existing methods. Genes were denoted as nodes, and interactions between gene pairs were presented as edges (lines) in the images. (A) Network identified by empirical Bayesian method. (B) Network based on search tool for the retrieval of interacting genes/proteins database. (C) Coeexpression network constructed using the differentially-coexpressed genes and links approach. (D) Network based on weighted gene co-expression network analysis.

Figure 2.

Combined gene interaction network based on the novel scores of each gene pairs across 4 methods. Genes were denoted as nodes and interactions between gene pairs were presented as edges (lines) in the image. A total of 280 nodes and 515 edges composed the combined network.

Network analysis showed that 4/5 networks exhibited the scale-free property, with a degree distribution that follows the power law with high fitting coefficients R2, with the exception of the network constructed using the WGCNA method (R2=0.264). The combined network showed the highest fitting coefficient (R2=0.977) compared with the other 4 networks (Fig. 3), which indicates the evident scale-free property and increased robustness against the random failure of the network, compared with the other networks. However, the network constructed by the WGCNA method was more likely to be a small-world network, with the smallest mean shortest path length (1.783) and the largest clustering coefficient (0.813). The detailed parameters of the 5 networks are shown in Table II.

Figure 3.

Scatter-gram of gene degree in the combined network. The combined network is a scale-free network of which the degree distribution followed a power law (y = axb, where a=121.0, b=−1.315) with the highest fitting coefficient (R2=0.977).

Table II.

Parameters of 5 networks constructed using 4 existing approaches and a novel algorithm.

Characteristic	EB	STRING	DCGL	WGCNA	Combination
Nodes	703.000	419.000	537.000	79.000	280.000
Edges	2,064.000	3,734.000	6379.000	649.000	515.000
R²	0.963	0.931	0.938	0.264	0.977
Clustering coefficient	0.024	0.453	0.118	0.813	0.211
Mean shortest path length	3.673	5.337	2.715	1.783	4.195

EB, empirical Bayesian; STRING, search tool for the retrieval of interacting genes/proteins; DCGL, differentially-coexpressed genes and links; WGCNA, weighted gene coexpression network analysis.

Functional enrichment analysis

All the KEGG pathways that the DE genes enriched were obtained as background, and 7 significant terms were identified, including extracellular matrix-receptor interaction (P=0.0000977), cell adhesion molecules (P=0.000991), p53 signaling pathway (P=0.00147), focal adhesion (P=0.00151), vascular smooth muscle contraction (P=0.00265), cell cycle (P=0.00335), and complement and coagulation cascades (P=0.00519). In order to investigate the enriched pathways of the gene pairs identified by various methods, the number of gene pairs enriched in each pathway was calculated and compared (Table III). Following the combination of the 4 existing methods, the gene pairs mostly enriched the cell cycle and p53 signaling pathway. The common pathway that gene pairs enriched across the 5 methods was the cell cycle.

Table III.

Enriched Kyoto encyclopedia of genes and genomes pathways of gene pairs identified by 4 existing methods and a novel algorithm.

			Number of gene pairs

Pathway	Category	P-value	EB	STRING	DCGL	WGCNA	Combination
ECM-receptor interaction	hsa04512	0.000098	0	36	3	0	1
Cell adhesion molecules	hsa04514	0.000991	1	5	1	0	0
p53 signaling pathway	hsa04115	0.001466	1	21	1	0	4
Focal adhesion	hsa04510	0.001510	1	38	3	0	2
Vascular smooth muscle contraction	hsa04270	0.002649	0	7	1	0	1
Cell cycle	hsa04110	0.003350	1	95	8	2	10
Complement and coagulation cascades	hsa04610	0.005190	0	3	0	0	0

ECM, extracellular matrix; EB, empirical Bayesian; STRING, search tool for the retrieval of interacting genes/proteins; DCGL, differentially-coexpressed genes and links; WGCNA, weighted gene coexpression network analysis.

Discussion

In the present study, a novel algorithm that combined multiple existing approaches was applied in order to better understand the molecular mechanisms of lung ADC. First, samples from patients with and without lung ADC were compared. Next, the RankProd package was used to identify DE genes, and a total of 941 DE genes were screened across 4 datasets. Based on these DE genes, gene interaction networks were constructed, and the score value of each gene pair was obtained using the EB coexpression approach, STRING database, DCGL method and WGCNA package. A novel algorithm was applied to convert and combine the score values that were obtained from the aforementioned methods; a novel matrix with a combined score of each gene pair was then produced and sorted using a rank-based method. Finally, the combined gene interaction network was constructed via linking gene pairs. A map of PPIs may provide useful revelations with regard to the cellular function and machinery of a proteome (44). A variety of methods have been proposed for the analysis of gene expression microarray data; however, few methods exist that use microarray data to quantify the interassociated behavior of genes within a gene interaction network (45). The incidence of cancer is considered to be closely associated with the abnormal expression of numerous genes; however, the previous methods used to study DE genes are inadequate, as there is a large difference between identifying DE genes and understanding the complex mechanisms of cancer. Therefore, the study of gene interactions is essential, as gene interactions are important for biological processes (46). Network-based approaches utilizing interaction information between gene pairs have emerged as powerful tools for the systematic understanding of the molecular mechanisms underlying biological processes, and a number of algorithms have been created to study these biological networks. Barter et al (47) performed a comparative analysis and indicated that the network-based method was more stable compared with single-gene and gene-set methods. Wu et al (48) also developed a network-based differential gene expression (nDGE) analysis, and demonstrated that nDGE outperformed existing methods for the prioritization of deregulated genes and the identification of deregulated gene modules using simulated data sets. Furthermore, a study conducted by Li et al (49) identified several key genes that were closely associated with survival in patients with lung ADC using a network-based approach. The topological properties of gene interaction networks have been studied widely. Gene interaction networks have been indicated to exhibit small-world and scale-free properties (50,51), which are typical of biological networks. Featherstone and Broadie (52) demonstrated that the scale-free property of the gene interaction network aided organisms by conferring the ability of resistance to the deleterious effects of mutation. Similar architecture was also indicated in the gene coexpression network of gastric cancer (53). The small-world property of biological networks was also confirmed in multiple data sources (43). In particular, Arita (54) indicated that the metabolic world of Escherichia coli was not a small biological network, but a network with a mean shortest path length that was much longer than previously hypothesized. In the present study, 5 gene interaction networks of lung ADC were constructed using 4 existing approaches and a novel combined algorithm. The network built using the WGCNA method was the most likely to be a small-world network, with the smallest mean shortest path length and the largest clustering coefficient. However, the combined network was revealed to be a scale-free network that possessed a node degree distribution that followed a power law with the highest fitting coefficient. Generally, gene pairs that are connected closely participate in the same pathway. Li et al (49) suggested that alterations in cell cycle genes and pathways were associated with tumor grade and contributed to the survival of lung ADC patients, regardless of smoking status, using a systems biology-based network approach. The study conducted by Wu et al (48) also identified that cell cycle-associated genes played a role in the molecular variations between smoker and non-smoker lung ADC. A study of cisplatin in lung ADC demonstrated that cisplatin exerted a cytotoxic effect through the blockage of the cell cycle pathway, and may be partly regulated by the p53 signaling pathway. Consistent with previous studies, the findings in the present study suggested that the gene pairs mainly enriched the cell cycle and p53 signaling pathway subsequent to combination, and that the cell cycle pathway was the common pathway that gene pairs enriched across 5 methods. In the present study, 4 existing network-based approaches were presented. Evidently, varying methods often possess varying abilities. Therefore, a novel merged approach was created to enhance stability and reliability. The combined gene interaction network was constructed by reassembling the scores of gene pairs from 4 existing methods. Network analysis showed that the network constructed by the WGCNA method was more inclined to be a small-world property and that the combined network was revealed to demonstrate scale-free network features. In addition, pathway analysis demonstrated that the cell cycle pathway was involved in the pathogenesis of lung ADC. When considering the applications and limitations of each of the methods, the novel merged algorithm outlined in the present study may provide a more credible and robust outcome for genetic network analyses, and is recommended for future application.

52 in total

1. Hierarchical organization of modularity in metabolic networks.

Authors: E Ravasz; A L Somera; D A Mongru; Z N Oltvai; A L Barabási
Journal: Science Date: 2002-08-30 Impact factor: 47.728

2. An empirical Bayesian approach for identifying differential coexpression in high-throughput experiments.

Authors: John A Dawson; Christina Kendziorski
Journal: Biometrics Date: 2011-10-17 Impact factor: 2.571

3. Gene expression patterns combined with bioinformatics analysis identify genes associated with cholangiocarcinoma.

Authors: Chen Li; Weixing Shen; Sheng Shen; Zhilong Ai
Journal: Comput Biol Chem Date: 2013-09-19 Impact factor: 2.877

Review 4. Pre-processing of microarray data and analysis of differential expression.

Authors: Steffen Durinck
Journal: Methods Mol Biol Date: 2008

5. Weighted gene co-expression network analysis in identification of endometrial cancer prognosis markers.

Authors: Xiao-Lu Zhu; Zhi-Hong Ai; Juan Wang; Yan-Li Xu; Yin-Cheng Teng
Journal: Asian Pac J Cancer Prev Date: 2012

6. Identifying set-wise differential co-expression in gene expression microarray data.

Authors: Sung Bum Cho; Jihun Kim; Ju Han Kim
Journal: BMC Bioinformatics Date: 2009-04-16 Impact factor: 3.169

7. DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data.

Authors: Bao-Hong Liu; Hui Yu; Kang Tu; Chun Li; Yi-Xue Li; Yuan-Yuan Li
Journal: Bioinformatics Date: 2010-08-26 Impact factor: 6.937

8. Link-based quantitative methods to identify differentially coexpressed genes and gene pairs.

Authors: Hui Yu; Bao-Hong Liu; Zhi-Qiang Ye; Chun Li; Yi-Xue Li; Yuan-Yuan Li
Journal: BMC Bioinformatics Date: 2011-08-02 Impact factor: 3.169

9. Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations.

Authors: Xiaomei Wu; Lei Zhu; Jie Guo; Da-Yong Zhang; Kui Lin
Journal: Nucleic Acids Res Date: 2006-04-26 Impact factor: 16.971

10. Weighted gene co-expression network analysis of the peripheral blood from Amyotrophic Lateral Sclerosis patients.

Authors: Christiaan G J Saris; Steve Horvath; Paul W J van Vught; Michael A van Es; Hylke M Blauw; Tova F Fuller; Peter Langfelder; Joseph DeYoung; John H J Wokke; Jan H Veldink; Leonard H van den Berg; Roel A Ophoff
Journal: BMC Genomics Date: 2009-08-27 Impact factor: 3.969