| Literature DB >> 31860671 |
Abby Hill1, Scott Gleim1, Florian Kiefer2, Frederic Sigoillot1, Joseph Loureiro1, Jeremy Jenkins1, Melody K Morris3.
Abstract
Computational approaches have shown promise in contextualizing genes of interest with known molecular interactions. In this work, we evaluate seventeen previously published algorithms based on characteristics of their output and their performance in three tasks: cross validation, prediction of drug targets, and behavior with random input. Our work highlights strengths and weaknesses of each algorithm and results in a recommendation of algorithms best suited for performing different tasks.Entities:
Mesh:
Year: 2019 PMID: 31860671 PMCID: PMC6944391 DOI: 10.1371/journal.pcbi.1007403
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Overview of network algorithm benchmarking workflow: All algorithms considered in this work required a set of identified genes of relevant to a disease, pathway, or treatment (i.e. “start nodes”) as inputs while some also required fold changes and/or p-values.
The output of algorithms differed depending on algorithm class, with subnetwork ID algorithms returning highly connected subnetworks; node prioritization algorithms returning ranked lists of genes; and causal regulator algorithms returning ranked lists of hypotheses corresponding to a positive or negative effect of a given gene on the observed data. In the case of node prioritization and causal regulator algorithms, we considered the “output nodes” as the top ranked nodes using a rank cutoff equal to the number of input start nodes for each data set. Also, we note that subnetworks could be constructed from the interactions among the most highly ranked genes in the output lists. For illustration purposes for this figure, we have used the list of top 100 hits (based on p-value) from a CRISPR survival screen in the KBM7 cell line [7]. Each output network contains genes that were included in the input start node list (blue) as well as genes that were identified by the algorithms (pink).
Algorithms evaluated.
| Algorithm | Category | Network Requirment | Brief Description | Reference |
|---|---|---|---|---|
| Random Walk | Node Prioritization | Models path of a random walker starting from nodes of interest and walking to other nodes based on edges in the network | [ | |
| Network Propagation | Node Prioritization | Random walk based approach controlled for degree of nodes | [ | |
| ToppNet KM | Node Prioritization | Directed | Random walk-based method with limited number of steps | [ |
| ToppNet HITS | Node Prioritization | Directed | Random walk-based method that also takes into account hubness and authority of nodes | [ |
| Overconnectivity | Node Prioritization | Enrichment of start nodes and gene sets consisting of each network nodes’ neighbors | N/A | |
| Interconnectivity | Node Prioritization | Enrichment based method that identifies nodes between other nodes | [ | |
| Hidden Nodes | Node Prioritization | Enrichment based method that uses shortest paths to identify nodes between other nodes | [ | |
| GeneMania | Node Prioritization | Ranks nodes by topological closeness to start nodes in an integrated network | [ | |
| Guilt By Association | Node Prioritization | Fraction of neighbor nodes that appear in the start node list | [ | |
| Neighborhood Scoring | Node Prioritization | Guilt-by-association based approach with optional weighting for start nodes | [ | |
| Causal Reasoning | Causal Regulator | Signed and Directed | Processes network and calculates directional consistency and overconnectivity with start nodes | [ |
| SigNet | Causal Regulator | Signed and Directed | Processes network and calculates several metrics to infer relationship with start nodes | [ |
| DIAMOnD | Subnetwork ID | Evaluates overconnectivity enrichment iteratively until it reaches a user-defined number of nodes | [ | |
| Pathway Inference | Subnetwork ID | Heuristic methods that identifies subnetworks enriched in start nodes | [ | |
| Active Modules | Subnetwork ID | Memetic algorithm with addition of encoding/decoding scheme and local search operator | [ | |
| CASNet | Subnetwork ID | Signed | Considers edge sign to determine relevance to provided start nodes | [ |
| HotNet1 | Subnetwork ID | Diffusion based method accounting for FDR | [ | |
| HotNet2 | Subnetwork ID | Directed | Extension of HotNet1 approach than incorporates insulated diffusion and edge direction | [ |
| Start Node Links | Subnetwork ID | Directly extracts connections between start nodes | N/A | |
Fig 2Characterizing algorithms using average fraction of start nodes in the output to indicate tendency to return start nodes in output (A, top left) and degree to indicate tendency to return nodes with many edges (B, top right). Cross-validation performance of algorithms as indicated by the fraction of datasets for which the algorithm appeared in the top five when ranked by AUROC (C, bottom left) or Fraction recovered (D, bottom right). For the fraction recovered analysis, the top nodes were defined as the 200 top-ranked nodes for node prioritization and causal regulator algorithms or any node present in a subnetwork for subnetwork ID algorithms.
Number of nodes ranked in top 200 when algorithms were run with 200 randomly chosen nodes as input start nodes.
| Algorithm | Number of nodes highly ranked in 50% of random input tests | Number of nodes highly ranked in 5% of random input tests |
|---|---|---|
| Causal Reasoning (Pollard Rank) | 64 | 1129 |
| InterConnectivity | 44 | 1042 |
| Hidden Nodes | 0 | 559 |
| SigNet | 200 | 375 |
| Network Propagation | 0 | 309 |
| ToppNet–HITs | 239 | 289 |
| Random Walk | 4 | 200 |
| Guilt by Association | 0 | 119 |
| ToppNet–KM | 0 | 56 |
| Causal Reasoning (Enrichment Rank) | 0 | 0 |
| Overconnectivity | 0 | 0 |
| Neighborhood Scoring | 0 | 0 |
| GeneMania | 0 | 0 |
Fig 3Connectivity Map target prediction in the composite network or metabase signed+directed.
Performance was characterized by the ability of the algorithms to highly rank known targets of drugs. (A, top left) Fraction of datasets for which the algorithm appeared in the top five when ranked by fraction of drug targets recovered (B, top right) Fraction of datasets for which the algorithm appeared in the top five when ranked by AUROC.
Summary of Algorithm Characteristics and Performance.
“Tunable” indicates that the algorithm contains an tunable parameter directly related to the evaluated aspect. Bold italics are used to indicate algorithms that perform well for the indicated metric with flanking asterisks distinguishing the top performers.
| Algorithm | Highly ranks start nodes | Output Degree | Highly ranks nodes with random inputs (number of nodes in 50%/5% of test cases) | Number of datatypes for which algorithm is top for gene list extension (AUROC, FR) | Number of networks for which algorithm is top for target prediction task (AUROC, FR) |
|---|---|---|---|---|---|
| Network Propagation | tunable | ||||
| Random Walk | Y, tunable | ||||
| GeneMania | Y | ||||
| Interconnectivity | High | 44, 1042 | |||
| ToppNet–HITS | Y, tunable | 239, 289 | |||
| Overconnectivity | High | ||||
| DIAMOnD | tunable | n/a | |||
| ToppNet–KM | tunable | Low | 0, 0 | ||
| Hidden Nodes | |||||
| Guilt By Association | Low | 0, 0 | n/a, 0 | ||
| Neighborhood Scoring | Y, tunable | Low | 0, 0 | ||
| Pathway Inference | Y, tunable | n/a | n/a, 0 | n/a, 0 | |
| Active Modules | Y, tunable | tunable | n/a | n/a, 0 | n/a, 0 |
| CASNet | Y | n/a | n/a, 0 | n/a, 0 | |
| HotNet1 | Y, tunable | n/a | n/a, 0 | n/a, 0 | |
| HotNet2 | Y, tunable | n/a | n/a, 0 | n/a, 0 | |
| Start Node Links | Y | n/a | n/a, 0 | n/a, 0 | |
| Causal Reasoning | Low | 64, 1129 (Pollard) | 0, 0 | n/a, 0 | |
| SigNet | High | 200, 375 | 0, 0 |