| Literature DB >> 30899321 |
Jimmy Wu1, Alex Khodaverdian2, Benjamin Weitz2, Nir Yosef2.
Abstract
BACKGROUND: Network connectivity problems are abundant in computational biology research, where graphs are used to represent a range of phenomena: from physical interactions between molecules to more abstract relationships such as gene co-expression. One common challenge in studying biological networks is the need to extract meaningful, small subgraphs out of large databases of potential interactions. A useful abstraction for this task turned out to be the Steiner Network problems: given a reference "database" graph, find a parsimonious subgraph that satisfies a given set of connectivity demands. While this formulation proved useful in a number of instances, the next challenge is to account for the fact that the reference graph may not be static. This can happen for instance, when studying protein measurements in single cells or at different time points, whereby different subsets of conditions can have different protein milieu. RESULTS AND DISCUSSION: We introduce the condition Steiner Network problem in which we concomitantly consider a set of distinct biological conditions. Each condition is associated with a set of connectivity demands, as well as a set of edges that are assumed to be present in that condition. The goal of this problem is to find a minimal subgraph that satisfies all the demands through paths that are present in the respective condition. We show that introducing multiple conditions as an additional factor makes this problem much harder to approximate. Specifically, we prove that for C conditions, this new problem is NP-hard to approximate to a factor of C - ϵ , for every C ≥ 2 and ϵ > 0 , and that this bound is tight. Moving beyond the worst case, we explore a special set of instances where the reference graph grows monotonically between conditions, and show that this problem admits substantially improved approximation algorithms. We also developed an integer linear programming solver for the general problem and demonstrate its ability to reach optimality with instances from the human protein interaction network.Entities:
Keywords: Approximation algorithm; NP hard; Protein–protein interaction; Steiner Network
Year: 2019 PMID: 30899321 PMCID: PMC6408827 DOI: 10.1186/s13015-019-0141-z
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Fig. 1Examples of well studied network problems (a), and their corresponding extension with multiple conditions (b). The problems shown are: Undirected Steiner Tree, Directed Steiner Network, and Shortest Path, respectively. Yellow nodes and red edges correspond to nodes and edges used in the optimal solutions for the corresponding instances
Approximation bounds for the various Steiner Network Problems in their classic setting and condition setting
| Problems | Classic | Condition | ||
|---|---|---|---|---|
| Lower bound | Upper bound | Lower bound(s) | Upper bound(s) | |
| Steiner Forest | 1.01 [ | 2 [ |
| 2 |
| Directed Steiner Network |
|
| ||
| Undirected Shortest Path | N/A | 1 |
|
|
| Directed Shortest Path | N/A | 1 |
|
|
| Steiner Tree | 1.01 [ | 1.39 [ |
| 1.39 |
| Prize-Collecting Steiner Tree | 1.01 [ | 1.97 [ |
| 1.97 |
For the classic problems, we have indicated the papers in which the bounds are shown. For the condition problems, all the lower bounds are developed in the present work; all the upper bounds are the naive bounds obtained from the “union of shortest paths” heuristic, or from applying the best known approximation algorithm for the appropriate classic Steiner problem to each condition, then taking the union of those solutions
Fig. 2(Left) A bundle whose upper strand is a chain of two bundles; the lower strand is a simple strand. Contact edges are orange. (Right) Three bundles (blue, green, red indicate different conditions), with one strand from each merged together
Fig. 3Integer linear program for Single-Source Condition Steiner Network. 1 for v at condition c if v is a target at condition c, for v at condition c if v is the source node at condition c, 0 otherwise
ILP solve times for some random instances generated by our random model using the Gurobi Python Solver package [37]
|
|
|
| Time to solve |
|---|---|---|---|
| 100 | 1 | 1.0 | 45 s ± 5 s |
| 10 | 1 | 0.25 | 1 m ± 10 s |
| 100 | 1 | 0.25 | 1 m ± 10 s |
| 10 | 1 | 0.75 | 1 m ± 10 s |
| 100 | 1 | 0.75 | 1 m ± 10 s |
| 100 | 10 | 1.0 | 7 m ± 30 s |
| 10 | 10 | 0.25 | 9 m ± 30 s |
| 10 | 10 | 0.75 | 11 m ± 30 s |
| 100 | 10 | 0.25 | 12 m ± 30 s |
| 100 | 10 | 0.75 | 17 m ± 2 m |
| 100 | 100 | 1.0 | 1 h 40 m ± 15 m |
| 10 | 100 | 0.75 | 2 h 30 m ± 12 m |
| 100 | 100 | 0.75 | 4 h ± 40 m |