| Literature DB >> 25148538 |
Bayarbaatar Amgalan1, Hyunju Lee1.
Abstract
Sub-networks can expose complex patterns in an entire bio-molecular network by extracting interactions that depend on temporal or condition-specific contexts. When genes interact with each other during cellular processes, they may form differential co-expression patterns with other genes across different cell states. The identification of condition-specific sub-networks is of great importance in investigating how a living cell adapts to environmental changes. In this work, we propose the weighted MAXimum clique (WMAXC) method to identify a condition-specific sub-network. WMAXC first proposes scoring functions that jointly measure condition-specific changes to both individual genes and gene-gene co-expressions. It then employs a weaker formula of a general maximum clique problem and relates the maximum scored clique of a weighted graph to the optimization of a quadratic objective function under sparsity constraints. We combine a continuous genetic algorithm and a projection procedure to obtain a single optimal sub-network that maximizes the objective function (scoring function) over the standard simplex (sparsity constraints). We applied the WMAXC method to both simulated data and real data sets of ovarian and prostate cancer. Compared with previous methods, WMAXC selected a large fraction of cancer-related genes, which were enriched in cancer-related pathways. The results demonstrated that our method efficiently captured a subset of genes relevant under the investigated condition.Entities:
Mesh:
Year: 2014 PMID: 25148538 PMCID: PMC4141761 DOI: 10.1371/journal.pone.0104993
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Workflow of WMAXC.
(1) Gene expression data consisting of normal and cancer samples (1A) and the PPI network (1B) are used as inputs. (2) We begin by constructing two responsive networks under the investigated condition: In (2A), we use two statistic measurements to construct a bio-molecular network. For each gene, is used to measure activity of gene (a node score) and for each pair of genes is used to measure connectivity relationship between gene and gene (an edge score). In (2B), for each interaction in PPI network, is used to measure activity of interaction behavior between gene and gene (an edge contribution score from PPI) and for each gene, is used to measure the weighted degree of gene (a node contribution score from PPI) under the condition. (3) We then combine the two responsive networks to construct the background network by assigning node and edge scores to a set of genes. Orange edges represent gene-gene co-expression estimated from only gene expression data and green edges represent activity of interactions in the PPI network. In the process of combining two networks, new edges are included to (2A) although they are not in the existing PPI network. (4) Finally, we solve the constrained optimization problem to obtain the single optimal sub-network.
Comparison on simulated data with COSINE.
| methods | WMAXC | COSINE | ||||||
| Case data | Case 1 | Case 2 | Case 3 | Case 4 | Case 1 | Case 2 | Case 3 | Case 4 |
| sub-network size | 204 | 230 | 219 | 237 | 137 | 178 | 126 | 128 |
|
| 0.157 | 0.215 | 0.087 | 0.223 | 0.61 | 0.85 | 0.11 | 0.39 |
| Recall | 1 | 0.97 | 1 | 0.975 | 0.66 | 0.685 | 0.61 | 0.605 |
| Precision | 0.9803 | 0.8434 | 0.9132 | 0.8227 | 0.9635 | 0.7696 | 0.9682 | 0.9453 |
| F-measure | 0.9901 | 0.9023 | 0.9546 | 0.8924 | 0.7833 | 0.7248 | 0.7484 | 0.7378 |
Recall, precision and F-measure are defined as follows: Recall (R) , Precision (P) and F-measure , where TP, FP, and FN represent true positive, false positive, and false negative, respectively.
Performance on ovarian cancer data.
| Methods | COSINE | BMRF | WMAXC1 | WMAXC2 |
|
| 0.871 | - | 0.173 | 0.2715 |
| Selected genes | 806 | 916 | 567 | 643 |
| Recovered interactions | 275 | 635 | 483 | 2015 |
| Recovered genes | 36 | 58 | 38 | 57 |
| Fold enrichment | 1.237 | 1.753 | 1.828 | 2.454 |
WMAXC1 represents the results obtained using only gene expression profile data, whereas the WMAXC2 results were obtained by integrating gene expression profiles and PPI network data. ‘Fold enrichment’ was used to evaluate the performance of the methods and was calculated as , where ‘Selected genes’ is the number of selected genes by the method, ‘Reference genes’ is the number of reference genes from the Ovarian Cancer Dragon Database of genes, ‘Recovered genes’ is recovered genes by the method among the reference genes, and ‘All genes’ represents all genes in the entire network. ‘Recovered interactions’ represents the number of interactions recovered from the PPI network.
Figure 2The four candidate genes for ovarian cancer and their neighbor genes in the condition specific network.
The four candidate ovarian cancer-related genes are colored in red, ovarian cancer-related genes in green, cancer-related genes in blue and the remaining genes in pink. Edges represent significant co-expressions between genes in the given ovarian cancer.
Performances on simulated PPI data with different scaling parameters.
| Parameters | Original PPI network | PPI data set-1 | PPI data set-2 | PPI data set-3 |
|
| 2.2704 | 1.7681 | 2.0443 | 1.7569 |
|
| 2.454 | 2.3092 | 2.2176 | 1.615 |
|
| 2.4592 | 2.3843 | 2.1657 | 1.5638 |
For the PPI data set-1, set-2 and set-3, 30%, 50% and 70% of edges from the original data are randomly removed and then the same number of edges are randomly added, respectively. Performances are measured using the fold enrichment, which is described in Table 2.