| Literature DB >> 26000032 |
Sharon Bruckner1, Falk Hüffner2, Christian Komusiewicz2.
Abstract
The core-periphery model for protein interaction (PPI) networks assumes that protein complexes in these networks consist of a dense core and a possibly sparse periphery that is adjacent to vertices in the core of the complex. In this work, we aim at uncovering a global core-periphery structure for a given PPI network. We propose two exact graph-theoretic formulations for this task, which aim to fit the input network to a hypothetical ground truth network by a minimum number of edge modifications. In one model each cluster has its own periphery, and in the other the periphery is shared. We first analyze both models from a theoretical point of view, showing their NP-hardness. Then, we devise efficient exact and heuristic algorithms for both models and finally perform an evaluation on subnetworks of the S. cerevisiae PPI network.Entities:
Keywords: Graph classes; NP-hard problems; Protein complexes
Year: 2015 PMID: 26000032 PMCID: PMC4440566 DOI: 10.1186/s13015-015-0043-7
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1An example input and optimal solutions to CLUSTER EDITING, SPLIT CLUSTER EDITING, and MONOPOLAR EDITING. Dashed edges are edge deletions, bold edges are edge insertions. CLUSTER EDITING and SPLIT CLUSTER EDITING produce the same two clusters but SPLIT CLUSTER EDITING assigns the blue vertex of the size-four cluster to the periphery. In an optimal solution to MONOPOLAR EDITING the two blue vertices are in the periphery which is shared between two clusters. Note that the number of necessary edge modifications decreases from CLUSTER EDITING to SPLIT CLUSTER EDITING to MONOPOLAR EDITING.
Figure 2The forbidden induced subgraphs for split graphs (2K 2, C 4, and C 5) and for split cluster graphs (C 4, C 5, P 5, necktie, and bowtie).
Input properties of the process networks
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|
| Cell cycle | 196 | 797 | 192 | 795 | 7 | 148 | 6.3 | 1151 |
| Transcription | 215 | 786 | 198 | 776 | 11 | 54 | 7.5 | 1479 |
| Translation | 188 | 2352 | 186 | 2351 | 5 | 88 | 27.4 | 174 |
Here, n is the number of proteins, without singletons, and m is the number of interactions; n lcc and m lcc are the number of proteins and interactions in the largest connected component; C is the number of CYC2008 complexes with at least 50% and at least three proteins in the network, p is the number of network proteins that do not belong to these complexes, and A C is the average complex size. Finally, i g is the number of genetic interactions between proteins without physical interaction.
Figure 3Running times for random graphs. Left: SPLIT CLUSTER EDITING; right: MONOPOLAR EDITING. A star indicates an instance that was aborted due to insufficient memory.
Figure 4Running times of the different ILP formulations for the PPI subnetworks. Left: SPLIT CLUSTER EDITING; right: MONOPOLAR EDITING.
Figure 5Running times of the best ILP formulations, of the two heuristics, and of LUO and SCAN for the PPI subnetworks.
Solution statistics and average GO term coherence for the process networks
|
|
|
| |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| SCE | 321 | 14 | 5 | 7 | 0.60 | 0.64 | 0.40 | 273 | 14 | 6 | 6 | 0.54 | 0.56 | 0.57 | 308 | 6 | 13 | 14 | 0.70 | 0.73 | 0.69 |
| ME | 126 | 24 | 3 | 9 | 0.45 | 0.57 | 0.40 | 106 | 26 | 3 | 7 | 0.50 | 0.60 | 0.54 | 240 | 11 | 8 | 12 | 0.59 | 0.61 | 0.54 |
| SCAN | — | 28 | 5 | 4 | 0.41 | 0.62 | 0.34 | — | 29 | 4 | 3 | 0.48 | 0.59 | 0.47 | — | 5 | 30 | 4 | 0.66 | 0.66 | 0.76 |
| Luo | — | 16 | 9 | 63 | 0.34 | 0.50 | 0.31 | — | 12 | 8 | 41 | 0.40 | 0.52 | 0.38 | — | 4 | 24 | 24 | 0.72 | 0.84 | 0.67 |
| CE | 461 | 28 | 4 | 1 | 0.51 | 0.51 | 0.38 | 392 | 28 | 4 | 1 | 0.56 | 0.57 | 0.68 | 937 | 10 | 11 | 4 | 0.71 | 0.73 | 0.71 |
Here, k is the number of edge modifications, K is the number of nontrivial clusters, and are the average size of the core and periphery in a nontrivial cluster, respectively, and c , c , and c are the average coherence within the cluster, core, and periphery, respectively.
Complex detection statistics for the process networks
|
|
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| SCE | 4 | 93/100 | 100/100 | 71/ 67 | 8 | 87/ 88 | 100/100 | 73/ 85 | 4 | 89/100 | 96/ 96 | 50/ 50 |
| ME | 5 | 89/100 | 99/100 | 90/ 89 | 11 | 89/100 | 93/100 | 76/ 79 | 4 | 89/100 | 95/ 96 | 55/ 64 |
| SCAN | 4 | 87/ 91 | 98/100 | 100/100 | 8 | 81/ 84 | 100/100 | 96/100 | 0 | —/— | —/— | —/— |
| Luo | 5 | 76/ 81 | 100/100 | 100/100 | 6 | 89/ 87 | 100/100 | 100/100 | 4 | 84/ 92 | 96/ 96 | 72/ 73 |
| CE | 5 | 80/ 87 | 94/100 | 40/ 0 | 9 | 87/ 91 | 94/100 | 89/100 | 4 | 84/ 92 | 90/ 94 | 55/ 60 |
Here, D is the number of detected complexes, core% is among the detected complexes the mean/median percentage of core vertices that are in this complex, comp% is the mean/median percentage of complex proteins that are in the cluster, and extra% is the mean/median percentage of periphery proteins that are not in the cluster.