| Literature DB >> 25520555 |
Biaobin Jiang1, Michael Gribskov1.
Abstract
Subnetwork detection is often used with differential expression analysis to identify modules or pathways associated with a disease or condition. Many computational methods are available for subnetwork analysis. Here, we compare the results of eight methods: simulated annealing-based jActiveModules, greedy search-based jActiveModules, DEGAS, BioNet, NetBox, ClustEx, OptDis, and NetWalker. These methods represent distinctly different computational strategies and are among the most widely used. Each of these methods was used to analyze gene expression data consisting of paired tumor and normal samples from 50 breast cancer patients. While the number of genes/proteins and protein interactions detected by the eight methods vary widely, a core set of 60 genes and 50 interactions was found to be shared by the subnetworks identified by five or more of the methods. Within the core set, 12 genes were found to be known breast cancer genes.Entities:
Keywords: TCGA; breast cancer; network biology; pathway analysis; subnetwork detection
Year: 2014 PMID: 25520555 PMCID: PMC4256043 DOI: 10.4137/CIN.S17641
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Overview of eight methods.
| METHOD | ALGORITHM | TOOL TYPE | REF. | INPUT NETWORK | INPUT EXPRESSION | RUNNING TIME (MIN) |
|---|---|---|---|---|---|---|
| jAM.SA | Simulated annealing | Cytoscape | HPRD | Adjusted | ∼40 | |
| jAM.GR | Greedy search | Cytoscape | HPRD | Adjusted | ∼4 | |
| DEGAS | Greedy heuristic | GUI | HPRD | Normalized counts | ∼3 | |
| BioNet | Integer-Linear Programming | R package | HPRD | ∼7 | ||
| NetBox | Shortest path | Python, Java | Preload | Seed genes | ∼100 | |
| ClustEx | Clustering, shortest path | C & GUI | HPRD | Seed genes | ∼150 | |
| OptDis | Color coding | C | HPRD | Normalized counts | ∼1560 | |
| NetWalker | Random walks | GUI | Preload | Adjusted | ∼0.1 |
Notes: jAM.SA denotes jActiveModules using Simulated Annealing; jAM.GR denotes jActiveModules using Greedy Search.
Performance summary of the eight methods.
| METHOD | DESCRIPTION | ADVANTAGE | LIMITATION |
|---|---|---|---|
| jAM.SA | Uses simulated annealing to search for the most highly scored subnetwork | Accepts low-scored genes with a certain probability | Produces large subnetwork; Slow |
| jAM.GR | Extends a subnetwork by adding one of its neighboring genes that maximizes a mutual information–based objective function | Fast; uses mutual information to evaluate subnetwork quality | Does not accept low-scored genes, high tendency to be trapped into a suboptimal solution |
| DEGAS | Models subnetwork detection as a Connected Set Cover problem and solves it using a greedy heuristic | Fast; able to detect differentially expressed genes; does not require weights of genes as inputs | Many parameters that need to be tuned |
| BioNet | The first exact approach. Models subnetwork detection as a Prize-Collecting Steiner Tree problem and solves it using Integer Linear Programming | Fast; produces a single small subnetwork with high coverage of significant genes | Produces single small output subnetwork with a high false-negative rate (low recall) |
| NetBox | Computes the shortest paths between genes in a given seed set and optimizes the size of subnetwork by adding the smallest number of linker genes on those paths | High coverage of significant genes (true positive rate) with the smallest number of insignificant genes (false positive rate) | Produces multiple small and isolated subnetworks |
| ClustEx | First, performs a hierarchical clustering to split the whole network into co-expressed modules, and second, extract subnetworks from the modules using shortest paths to connect significant genes | Combines clustering and shortest paths to detect highly co-expressed subnetworks | Produces multiple isolated subnetworks involving many genes |
| OptDis | Uses color coding technique to search for optimally discriminative subnetworks | Good coverage over significant genes, with small subnetworks | Cannot detect large subnetworks (over 20 genes); very slow |
| NetWalker | Diffuses information flows by random walks to prioritize important genes and interactions in the stationary state | Very fast, friendly GUI | Only produces scores for interactions, no subnetwork search, per se, without additional functional annotations |
Figure 1Volcano plots of differential gene expression showing −log10 of the P-values evaluated by DESeq as a function of the log2 fold change (shown in the [−6, 6] only, 99th percentile). The dots highlighted in red are the genes involving in each subnetwork produced by the eight methods.
Figure 2ROC curves of −log10 of the P-values predicting the eight subnetworks. The numbers in the brackets are the AUC.
Figure 3Modularity of the eight subnetworks.
Common genes identified by the eight methods.
| jAM.SA | jAM.GR | DEGAS | BioNet | NetBox | ClustEx | OptDis | NetWalker | |
|---|---|---|---|---|---|---|---|---|
| jAM.SA | 1290 | 144 | 182 | 164 | 285 | 158 | 52 | 160 |
| jAM.GR | 0.0936 | 393 | 137 | 168 | 213 | 63 | 40 | 146 |
| DEGAS | 0.1025 | 0.1484 | 667 | 143 | 247 | 85 | 57 | 136 |
| BioNet | 0.1038 | 0.1462 | 454 | 356 | 78 | 64 | 190 | |
| NetBox | 0.1526 | 0.1925 | 863 | 162 | 107 | 246 | ||
| ClustEx | 0.0817 | 0.0557 | 0.0615 | 0.0663 | 0.1079 | 801 | 34 | 11 5 |
| OptDis | 0.0365 | 0.0743 | 0.0717 | 0.1113 | 0.1137 | 0.0357 | 185 | 50 |
| NetWalker | 0.0872 | 0.1534 | 0.1100 | 0.1961 | 0.1861 | 0.0827 | 0.0595 | 705 |
Notes: The numbers on the diagonal indicate the numbers of genes identified by the corresponding method alone. The numbers above the diagonal are the numbers of genes identified by both the indicated methods. And the numbers below the diagonal are Jaccard similarities between the gene sets in the subnetworks of the indicated methods (similarities >0.2 are shown in bold).
Common interactions identified by the eight methods.
| jAM.SA | JAM.GR | DEGAS | BioNet | NetBox | ClustEx | OptDis | NetWalker | |
|---|---|---|---|---|---|---|---|---|
| jAM.SA | 2141 | 118 | 152 | 105 | 234 | 97 | 26 | 90 |
| jAM.GR | 0.0433 | 702 | 133 | 123 | 178 | 18 | 15 | 82 |
| DEGAS | 0.0446 | 0.0668 | 1421 | 105 | 256 | 34 | 27 | 84 |
| BioNet | 0.0397 | 0.0545 | 609 | 429 | 39 | 46 | 173 | |
| NetBox | 0.0686 | 0.0878 | 0.0960 | 1503 | 100 | 94 | 215 | |
| ClustEx | 0.0318 | 0.0107 | 0.0142 | 0.0248 | 0.0415 | 1004 | 12 | 51 |
| OptDis | 0.0110 | 0.0160 | 0.0164 | 0.0567 | 0.0567 | 0.0097 | 249 | 34 |
| NetWalker | 0.0316 | 0.0580 | 0.0394 | 0.0292 | 0.0337 | 795 |
Notes: The numbers on the diagonal indicate the numbers of interactions identified by the corresponding method alone. The numbers above the diagonal are the numbers of interactions found by both the indicated methods. And the numbers below the diagonal are the Jaccard similarities between the interaction sets selected by the indicated methods (similarities >0.1 are in bold).
Figure 4Prediction of the 462 breast cancer genes by the eight subnetworks. F1 score is defined as 2 × precision × recall/(precision + recall).
Figure 5Number of methods detecting genes and interactions in subnetworks. Histograms of the number of genes (A) and interaction counts (B) versus the number of methods that detect them. (A) All genes denote the 7,369 genes in the HPRD network. Breast cancer genes are the 462 genes found by KOBAS in multiple disease databases. Both the gene counts are scaled to [0, 1] by dividing by the maximum count. The percentage of breast cancer genes is the breast cancer gene count divided by the count of all the genes in each category (genes found by a certain number of methods). (B) All interactions denote the 28,571 interactions in the HPRD network. Breast cancer pathways are the 2,058 interactions found by KOBAS in multiple pathways databases. Both the interaction counts are scaled to [0, 1] by dividing by the maximum count. The percentage of breast cancer pathways is the interaction count in breast cancer pathways divided by the total interaction count in each category.
Figure 6Prominent subnetwork whose interactions are detected by at least five methods. Node color indicates log2 fold change of differential expression (yellow: upregulated in tumor samples; blue: downregulated in tumor samples). The 12 genes in red border are in the list of 462 known breast cancer genes. Visualized by Cytoscape 3.0 version.6