| Literature DB >> 26893596 |
Siamak Zamani Dadaneh1, Xiaoning Qian1.
Abstract
BACKGROUND AND MOTIVATIONS: Module identification has been studied extensively in order to gain deeper understanding of complex systems, such as social networks as well as biological networks. Modules are often defined as groups of vertices in these networks that are topologically cohesive with similar interaction patterns with the rest of the vertices. Most of the existing module identification algorithms assume that the given networks are faithfully measured without errors. However, in many real-world applications, for example, when analyzing protein-protein interaction networks from high-throughput profiling techniques, there is significant noise with both false positive and missing links between vertices. In this paper, we propose a new model for more robust module identification by taking advantage of multiple observed networks with significant noise so that signals in multiple networks can be strengthened and help improve the solution quality by combining information from various sources.Entities:
Keywords: Bayesian clustering; Module identification; Multiple-network clustering; Stochastic block model; Variational Bayes algorithm
Year: 2016 PMID: 26893596 PMCID: PMC4744266 DOI: 10.1186/s13637-016-0038-9
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Fig. 1Schematic illustration. a An example of multiple noisy networks with a coherent modular structure. b Graphical representation of the proposed probabilistic model for module identification across multiple networks as an extension of the module identification model for individual networks
Fig. 2Performance evaluation with synthetic networks. a Examples of noisy networks: The top network is with the lowest noise level (5 %) and the bottom network is with the highest noise level (50 %). b Performance results of the first set of experiments, where the instilled noise level increases gradually. c Performance results of the second set of experiments, where the noise level remains constant
Fig. 3Error bar plots of Area Under the Curve (AUC) of ROC for training ratios ranging from 20 to 80 %
Fig. 4Error bar plots of Area Under the Curve (AUC) of Precision-recall (PR) for training ratios ranging from 20 to 80 %
Performance comparison of different algorithms based on SGD golden standard
| Data set | Metric | Multiple network | ClusterOne | Hofman |
|---|---|---|---|---|
| DIP | Acc |
| 0.4731 | 0.4561 |
| Frac | 0.2129 |
| 0.1000 | |
| PPV | 0.4648 |
| 0.3295 | |
| Sep |
| 0.3329 | 0.3146 | |
| BioGRID | Acc |
| 0.5961 | 0.5549 |
| Frac | 0.2097 |
| 0.1871 | |
| PPV | 0.4738 |
| 0.4612 | |
| Sep |
| 0.3325 | 0.3505 |
The best indices are highlighted with bold fonts
Performance comparison of different algorithms based on MIPS golden standard
| Data set | Metric | Multiple network | ClusterOne | Hofman |
|---|---|---|---|---|
| DIP | Acc |
| 0.3178 | 0.3403 |
| Frac | 0.2381 |
| 0.1111 | |
| PPV | 0.3567 |
| 0.2651 | |
| Sep |
| 0.2216 | 0.2020 | |
| BioGRID | Acc |
| 0.4336 | 0.4383 |
| Frac | 0.2975 |
| 0.2275 | |
| PPV | 0.3713 |
| 0.3649 | |
| Sep |
| 0.2193 | 0.2189 |
The best indices are highlighted with bold fonts
Number of identified protein complexes by different algorithms for DIP and BioGRID data sets
| Data set | Multiple network | ClusterOne | Hofman |
|---|---|---|---|
| DIP | 320 | 328 | 112 |
| BioGRID | 278 | 424 | 189 |
Fig. 5One example of the predicted protein complexes by Hofman’s and our multiple-network clustering algorithms. The whole set of proteins were considered as a single complex by Hofman’s method, while the proteins colored in yellow and dark blue form the predicted complex returned by our new multiple-network clustering method. The colored proteins in yellow are the member proteins in the RNA polymerase II mediator protein complex in SGD golden standard. This figure is produced by [24]