| Literature DB >> 22784624 |
Wooyoung Kim1, Min Li, Jianxin Wang, Yi Pan.
Abstract
BACKGROUND: Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as over-represented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important.Entities:
Mesh:
Year: 2011 PMID: 22784624 PMCID: PMC3287573 DOI: 10.1186/1752-0509-5-S3-S5
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Shapes and labels for 4-node subgraphs in an undirected network. There are six types for 4-node subgraph in an undirected network. Each type is labeled with Nauty as shown as a text accordingly.
Results of 4-node biological network motifs in the DIP Core network
| Algorithm | Motif included in | GO Clustering score | |||
|---|---|---|---|---|---|
| Complex | Function | BP | MF | CC | |
| ESU | .13 | .205 | .64 | .51 | .61 |
| R | .13 | .208 | .65 | .28 | .46 |
| MF | .15 | .299 | .74 | .57 | .71 |
| E | .21 | ||||
| E | .392 | .78 | .60 | .79 | |
| NMFGO- | .18 | .360 | .78 | .61 | .75 |
| NMF- | .15 | .230 | .68 | .54 | .64 |
| V | .26 | .330 | .77 | .59 | .75 |
EDGEBETWEENNESS-BNM performs best in 'motif included in complex' measure while EDGEGO-BNM performs best in other measures.
Results of 5-node biological network motifs in the DIP Core network
| Algorithm | Motif included in | GO Clustering score | |||
|---|---|---|---|---|---|
| Complex | Function | BP | MF | CC | |
| ESU | .07 | .097 | .67 | .51 | .63 |
| R | .07 | .096 | .66 | .52 | .62 |
| MF | .09 | .167 | .75 | .56 | .72 |
| E | .08 | ||||
| E | .210 | .81 | .59 | .76 | |
| NMFGO- | .08 | .169 | .71 | .59 | .60 |
| NMF- | .13 | .104 | .65 | .53 | .61 |
| V | .08 | .121 | .71 | .50 | .67 |
EDGEBETWEENNESS-BNM performs best in 'motif included in complex' measure while EDGEGO-BNM pe forms best in other measures.
Results of 4-node biological network motifs in the Y2k network
| Algorithm | Motif included in | GO Clustering score | |||
|---|---|---|---|---|---|
| Complex | function | BP | MF | CC | |
| ESU | .501 | .152 | .61 | .21 | .67 |
| R | .491 | .126 | .61 | .23 | .65 |
| MF | .586 | .180 | .65 | .26 | .72 |
| E | .603 | .25 | |||
| E | .178 | .82 | .19 | .84 | |
| NMFGO- | .609 | .434 | .92 | ||
| NMF- | .819 | .177 | .76 | .26 | .80 |
| V | .638 | .200 | .63 | .26 | .77 |
EDGEBETWEENNESS-BNM performs best in 'motif included in complex' measure. NMFBO-bnm performs best on 'MF' and 'CC clustering score' measures. EDGEGO-BNM performs best in the 'motif included in functional module measure 'BP, CC clustering score' measures. However all the algorithms perform poorly in 'MF clustering score' measure, with less than 30.
Results of 5-node biological network motifs in the Y2k network
| Algorithm | Motif included in | GO Clustering score | |||
|---|---|---|---|---|---|
| Complex | function | BP | MF | CC | |
| ESU | .281 | .083 | .69 | .17 | .76 |
| R | .305 | .090 | .71 | .17 | .77 |
| MF | .431 | .096 | .73 | .21 | .80 |
| E | .362 | ||||
| E | .087 | .89 | .13 | .91 | |
| NMFGO- | .445 | .257 | .98 | .18 | .96 |
| NMF- | .643 | .073 | .80 | .18 | .83 |
| V | .665 | .089 | .82 | .19 | .85 |
EDGEBETWEENNESS-BNM performs best in 'motif included in complex' measure while EDGEGO-BNM performs best in other measures.
DIP Core- statistical properties, from FANMOD
| Label | Freq(Original) | Mean-Freq (Random) | S-Dev(Random) | Z-score | P-value |
|---|---|---|---|---|---|
| 1.46% | 5.9e-005% | 3.04e-006 | 4813.3 | < 10-3 | |
| CN | 10.21% | 0.01% | < 10-6 | 289.09 | < 10-3 |
| CF | 48.69% | 42.22% | < 10-6 | 17.31 | < 10-3 |
| 0.48% | 0.00% | 0 | undefined | < 10-3 | |
| Cr | 0.47% | 0.23% | < 10-6 | 16.28 | < 10-3 |
| CR | 38.65% | 57.54% | < 10-6 | -52.17 | > 10-2 |
Each type of 4-node subgraph shows its significance based on its structural uniqueness. The label is generated by Nauty program [24] and the corresponding shape is shown in Figure 1. All types except CR are structural network motifs by definition.
Figure 2DIP Core network: Search ratios based on the subgraph type. The ratio of frequency of each type is relatively preserved and it indicates that our algorithms can be used for the structural network motif discovery as well. Relative frequencies of each algorithm is plotted with different colors of line. The horizontal axis indicated each subgraph type for 4-node subgraphs. The vertical axis shows the relative frequency of each type. The values are shown in the table below the figure.
Y2k- statistical properties, from FANMOD
| Label | Freq(Original) | Mean-Freq (Random) | S-Dev(Random) | Z-score | P-value |
|---|---|---|---|---|---|
| 4.66% | 4.07e-006% | 9.14e-007 | 51013 | < 10-3 | |
| C^ | 8.91% | < 10-2 | 4.29e-005 | 2075.1 | < 10-3 |
| CN | 32.89% | 0.021% | < 10-6 | 225.64 | < 10-3 |
| Cr | 0.55% | 1.14% | < 10-6 | -9.95 | > 10-2 |
| CF | 19.58% | 41.82% | < 10-6 | -66.188 | > 10-2 |
| CR | 33.40% | 57.06% | < 10-6 | -84.16 | > 10-2 |
Each type of 4-node subgraph shows its significance based on its structural uniqueness. The label is generated by Nauty program [24] and the corresponding shape is shown in Figure 1. In this network, the first three types are detected as network motifs.
Figure 3Y2k network: Search ratios based on the subgraph type. The ratio of frequency of each type is relatively preserved and it indicates that our algorithms can be used for the structural network motif discovery as well. The description of the plots and the table is same as in Figure 2.
Y2k network: the rates of motifs included in a 'rRNA processing' functional module in yeast, computed using equation (1).
| Algorithm | C~ | C^ | CN | Cr | CF | CR |
|---|---|---|---|---|---|---|
| R | .30 | .32 | .34 | .36 | .34 | .34 |
| MF | .78 | .54 | .31 | .38 | .16 | .13 |
| .97 | .97 | .98 | 1.0 | .99 | .97 | |
| E | .67 | .64 | .32 | .57 | .22 | .16 |
| NMFGO- | .87 | .88 | .78 | .89 | .70 | .73 |
| NMF- | .69 | .39 | .23 | .22 | .12 | .90 |
| V | .53 | .38 | .39 | .39 | .32 | .31 |
Except ESU, all algorithms only search 30% of subgraphs in the original network. However, EDGEGO-BNM recovers over 90% of motifs included in functional module. We note that the non-motif types of Cr, CF and CR have a number of instances for this functional match, indicating structural uniqueness is insufficient to discover its biological significance.
Figure 4After graph modify. Original network (left) and the modified network (right) after removing edges or clustering the graph, where a number of clusters and a list of removed edges are provided as a result.
Figure 5GO DAG example. GO DAG example view, where the root node is a molecular function (MF) GO term.
Various algorithms used for the detection of biological network motifs
| Algorithm | Type | Time before ESU | Parameter | Deterministic |
|---|---|---|---|---|
| E | Edge-Removing | Yes | ||
| E | Edge-Removing | Yes | ||
| NMFGO- | Clustering | No | ||
| NMF- | Clustering | No | ||
| V | Clustering | No |
All the algorithms introduced in this paper are compared based on type, time before enumeration, parameter, and whether its deterministic property. Here d is GO depth threshold,l is the number of GO terms associated to the graph G, c is the number of clusters, r is the number of edges to remove, and η, β for sparse NMF computation.