Literature DB >> 28649433

On the performance of de novo pathway enrichment.

Richa Batra^1,2,3, Nicolas Alcaraz^1,4, Kevin Gitzhofer⁵, Josch Pauling⁶, Henrik J Ditzel^4,7, Marc Hellmuth^5,8, Jan Baumbach^1,9, Markus List¹⁰.

Abstract

De novo pathway enrichment is a powerful approach to discover previously uncharacterized molecular mechanisms in addition to already known pathways. To achieve this, condition-specific functional modules are extracted from large interaction networks. Here, we give an overview of the state of the art and present the first framework for assessing the performance of existing methods. We identified 19 tools and selected seven representative candidates for a comparative analysis with more than 12,000 runs, spanning different biological networks, molecular profiles, and parameters. Our results show that none of the methods consistently outperforms the others. To mitigate this issue for biomedical researchers, we provide guidelines to choose the appropriate tool for a given dataset. Moreover, our framework is the first attempt for a quantitative evaluation of de novo methods, which will allow the bioinformatics community to objectively compare future tools against the state of the art.

Entities: Chemical Disease Gene Species

Year: 2017 PMID： 28649433 PMCID： PMC5445589 DOI： 10.1038/s41540-017-0007-2

Source DB: PubMed Journal: NPJ Syst Biol Appl ISSN： 2056-7189

Introduction

Modern high-throughput OMICS technologies (genomics, transcriptomics, proteomics, metabolomics, etc.) are driving the exponential growth of biological data. With continuously decreasing costs, it is now possible to determine the activity of genes, proteins and metabolites, their chemical modification (phosphorylation, methylation, etc.), as well as mutations on a genome scale. For example, the gene expression omnibus (GEO) alone hosts >51,000 study records derived from >1600 organisms.[1] Moreover, many international consortia have gathered data from thousands of samples, for instance, the International Cancer Genome Consortium[2] has collected data on 50 cancer types derived from several OMICS technologies. To gain a holistic view of such huge datasets, network-based analysis has become a promising alternative to traditional enrichment approaches such as gene ontology enrichment[3] or gene set enrichment analysis.[4] The objective is to study condition-specific systemic changes by projecting experimental data on network modules, representing conditionally perturbed biological processes. Despite the inherent noise and incompleteness of currently available networks, network-based analysis has compelling advantages. It can capture the modular interplay of biological entities and processes overlooked in traditional enrichment methods[5] and can further unravel molecular interactions not covered in well-defined pathways.[6] In addition, network- based pathway enrichment methods can unravel crosstalk between pathways or sub-mechanisms which remain undetected in traditional pathway enrichment methods with known pathways such as KEGG or Reactome.[7]

De novo pathway enrichment

Over the years, several approaches have been proposed for network-based analysis of experimental data. While some methods exploit network topology to augment scoring of known pathways[4, 8–16], others rely on experimental data to reconstruct a condition-specific interaction network[17-20]. Here, we focus on methods that perform de novo pathway enrichment. These methods integrate experimental data with a large-scale interaction network, to extract sub-networks enriched in biological entities active in a given experimental dataset. The definition of active depends on the type of dataset, for instance, differential expression in the case of gene expression data. Diverse strategies for de novo pathway enrichment methods have been devised. To summarize the state of the art, we classify existing methods into four categories (Supplementary Fig. 1): (1) aggregate score optimization methods, (2) module cover approaches, (3) score propagation approaches, and (4) clustering-based approaches. Detailed descriptions of these classes are provided in the supplementary material. While several such methods exist (see Table 1), there are no guidelines to aid potential users in the choice of the appropriate tool. To fill this gap, we present a comparative analysis of selected tools across different datasets and parameter settings, as well as propose the first gold standard for quantitative benchmarking of de novo pathway enrichment methods (Fig. 1).

Table 1

List of publicly available de novo network enrichment methods in alphabetical order (February 7, 2016)

Tool	Method	Software	Reference
BioNet*	ASO	app	Ref. 25
ClustEx	Clust.	app	Ref. 32
cMonkey	Clust.	app	Ref. 33
COSINE*	SP	app	Ref. 26
GiGA*	SP	app	Ref. 34
GXNA*	SP	app	Ref. 29
HotNet	SP	app	Ref. 35
jActiveModules	ASO	C-PL	Ref. 36
KeyPathwayMiner*	MC	app, C-PL, WS	Ref. 30
DEGAS*	MC	app	Ref. 37
MEMCover	MC	app	Ref. 38
NetWalker	ASO	app	Ref. 16
NetworkTrail	ASO	WS	Ref. 39
PinnacleZ*	ASO	app, C-PL	Ref. 31
ReactomeViz-MCL	Clust.	C-PL	Ref. 40
RegMOD	SP	app	Ref. 41
ResponseNet	ASO	WS	Ref. 42
SubExtract	ASO	app	Ref. 43
TieDIE	SP	script (Python)	Ref. 44

De novo methods capable of processing gene expression data are sub-divided into four categories, i.e. ASO aggregate score optimization, SP score propagation, MC module cover and Clust clustering based; see supplementary material for details. The availability of the tool as a stand-alone application (app), as Cytoscape plugin (C-PL), or as web service (WS) is further indicated.

* Included in our quantitative study (see text for details)

Fig. 1

A typical workflow for de novo pathway enrichment. The underlying hypothesis is that phenotype-specific genes (foreground, FG) are differentially expressed in many case samples compared to a control group (1, 2) By using statistical tests, one can determine which genes are affected by the phenotype (3) and overlay this information on an interaction network (4) De novo pathway enrichment tools aim to extract sub-networks enriched with phenotype-specific FG genes (5) Comparing several such methods is an open issue

List of publicly available de novo network enrichment methods in alphabetical order (February 7, 2016) De novo methods capable of processing gene expression data are sub-divided into four categories, i.e. ASO aggregate score optimization, SP score propagation, MC module cover and Clust clustering based; see supplementary material for details. The availability of the tool as a stand-alone application (app), as Cytoscape plugin (C-PL), or as web service (WS) is further indicated. * Included in our quantitative study (see text for details) A typical workflow for de novo pathway enrichment. The underlying hypothesis is that phenotype-specific genes (foreground, FG) are differentially expressed in many case samples compared to a control group (1, 2) By using statistical tests, one can determine which genes are affected by the phenotype (3) and overlay this information on an interaction network (4) De novo pathway enrichment tools aim to extract sub-networks enriched with phenotype-specific FG genes (5) Comparing several such methods is an open issue

Challenges in evaluating de novo pathway enrichment tools

Pathway enrichment methods belong to the family of unsupervised statistical learning algorithms. In this context, unsupervised means that the quality of potential solutions cannot be assessed objectively unless the optimal solution is known a priori, i.e., a gold standard is needed. In case of de novo pathway enrichment, such a gold standard consists of genes known to be relevant (or irrelevant) to the condition(s) measured in the experimental data. In other words, a set of foreground genes (FG) and a set of background genes (BG) are required, such that one can utilize the proportion of FG genes (relevant) compared to BG genes (irrelevant) in the extracted sub-networks as a performance measure. In the following, we discuss why such a gold standard is currently unavailable. The disease- specific pathways of the KEGG database[21] could be considered as a potential gold standard, as they cover many known genes (and their interplay) associated with a certain disease or biological function. However, these genes constitute an incomplete FG gene set, since it is unlikely that all disease-associated genes are known. Furthermore, there is no distinction of potential FG genes and BG gene sets. This bias prohibits a fair and robust evaluation of de novo pathway enrichment tools using such databases. To address this issue, we propose a protocol to generate artificial, bias-free gold standard FG and BG datasets. Our approach relies on two main assumptions:

Signal strength

Signal strength also known as signal to noise ratio, it describes the dissimilarity of the expression values for a given set of FG genes compared to the remaining BG genes. The higher the signal strength, the more diverse are the expression profiles of the FG genes compared to the BG genes. Thus, we can expect pathway enrichment methods to identify them more easily.

Sparsity

It models the average distance between FG genes in the graph. The lower the sparsity, the more densely the FG genes are distributed over the network, such that pathway enrichment methods are expected to find them more easily.

Analysis strategy

We systematically generated test expression datasets with preselected (synthetically) FG and BG genes sets. Subsequently, representative de novo network enrichment tools were applied to identify FG-rich (and BG-poor) modules across parameter sets for each tool and for varying problem complexities, i.e. different levels of signal strength and sparsity. We used two common protein-protein interaction networks, namely the human protein reference database (HPRD)[22] and the interlogous interaction database (I2D).[23]

Results

The effect of varying signal strength

We generated expression profiles with varying mean (VM) and varying variation (VV) (see Fig. 4 and “Methods” for details). Figure. 2a and b illustrate that with increasing signal strength generally improves the performance of all tools.

Fig. 4

Illustration of the used models for FG and BG expression distributions generated for cases and control samples: VV in (a) and VM in (b)

Fig. 2

Average performance for over 80 FG sets of size 20 nodes were generated, using the AVD algorithm, with varying signal strength (a, b) and varying sparsity (c, d). Expression profiles were simulated with varying mean (VM) (a, c) and varying variation (VV) (b, d). The HPRD network was used as the input network and the performance was assessed using the F-measure. The error-bars (a, b) and box plots (c, d) represent performance over several FG nodes and over a range of internal parameter settings for each tool. The higher the signal strength, the more different are the expression profiles of the FG vs. BG genes, indicating that we can expect pathway enrichment methods to identify them more easily. For details on internal parameters, signal strength and sparsity values, please refer to Supplementary Tables 1, 3, 5 respectively BioNet, GXNA, and KeyPathwayMiner (KPM) performed best with VM. However, if the simulation of expression profiles was based on changes in the variance (VV) between FG and BG distributions, the top performers BioNet and GXNA performed poorly. We speculate that this was due to their internal expression profile preprocessing procedures. Thus, one should carefully consider the characteristics of FG and BG expression profiles in the choice of the most suitable tool.

The effect of varying sparsity

We used two algorithms to distribute the label FG across the nodes in the network. The seed-and-extend (SAE) algorithm generates connected random sub-networks of a particular size and labels all its nodes as FG. In contrast, the average-distance-k (AVD) algorithm distributes the FG node labels such that their average pairwise distance equals a given k. With increasing k values, the sparsity grows. All other nodes in the network were labeled as BG. We computed the sparsity of the FG nodes with three measures (a) Global FG proximity: Shortest path covering the FG set nodes in the network. (b) Global FG connectivity: Average degree of the FG nodes in the network. (c) Local FG density: Graph density of the FG nodes. As expected and depicted in Fig. 2c and d, the performance of all tools decreased with increasing sparsity (decreasing FG proximity, i.e. increasing k using AVD). Supplementary Fig. 2 further illustrates the performance dynamics with increasing FG density and connectivity. Evidently, all tools performed better with FG nodes selected using the SAE method (Supplementary Fig. 3). However, biological processes are complex and interaction networks are incomplete. Genes that are deregulated in a condition/disease may not be directly connected in the cellular network. Consequently, we believe that generally AVD is a better, more natural choice for FG selection. The results for the I2D network are depicted in Supplementary Figs 4 and 5.

The effect of varying internal parameters

We can further categorize tools based on the characteristic of their internal parameters (IP). COSINE, GiGA, GXNA, and PinnacleZ restrict the solution to a user-specified size. In contrast, BioNet, DEGAS and KPM allow for noise (exceptions or outliers) in the reported solutions, such that sub-networks can have arbitrary size within certain constraints to allow for noise. We tested each selected method on a wide range of IP values. Overall, the trend was that with increasing IP values the size of the solution increases (Fig. 3). Lower IP values led to solutions with fewer nodes, whereas higher IP values led to larger sub-networks. As expected, the performance (F-measure) decreased towards the extreme ends of the IP range.

Fig. 3

Average performance for over 80 FG sets of size 20 nodes generated using the AVD algorithm. Expression profiles were simulated with VM. The HPRD network was used as the input network. Performance was assessed with the F-measure for a range of internal parameter settings for each tool, i.e. the expected pathway size (M) for GXNA, the number of allowed exceptions (outliers) in a pathways (L) for KPM, and the pathway false discovery rate (FDR) for BioNet

Summary

Consistently across the two tested networks (HPRD and I2D), BioNet, COSINE, GXNA, and PinnacleZ performed better with expression data generated using VM as compared to VV. DEGAS yielded better results with VV, while the performance of KPM and GiGA was independent of VV and VM. When using VM GiGA, BioNet, GXNA, and KPM showed better performance with SAE (dense FG nodes) compared to AVD (relatively sparse). COSINE, DEGAS, and PinnacleZ appeared less affected. These trends were conserved across the two networks. Using HPRD as a network base, KPM, DEGAS, and GiGA performed better with VV with both FG node selection algorithms. All tools performed poorly with VV and AVD when using I2D. However, when using SAE instead, KPM and GiGA performed very well. Based on our analysis, we propose the following guidelines for biomedical researchers, (a) if computationally feasible, choose a comprehensive biological network such as I2D for capturing a wide range of known or predicted biological interactions (b) apply several tools to obtain a comprehensive picture of the information captured in the dataset. In any case, a wide range of IP should be tested. Please note that apart from gene expression profiles, some tools can be used for other types of OMICS data (molecular profiles) such as protein expression profiles or RNA sequencing data.

Discussion and conclusion

De novo pathway enrichment is a powerful method to uncover condition-specific functional modules in molecular biological networks. Existing methods for integrating networks and experimental data have recently been reviewed in Mitra et al.[6] Nevertheless, systematic and quantitative evaluations of the performance of existing methods and methodology to do so, have yet to be published. To establish a basis for such a comparison in the absence of a suitable gold standard, we propose a strategy to assess the performance of de novo pathway enrichment tools on simulated data. We sampled nodes in the network (e.g. genes) that are relevant in a condition from a foreground distribution and non-relevant nodes from a background distribution. By adjusting the sparsity and signal strength of the solutions “hidden” in the network, we were able to adjust the difficulty of the de novo pathway enrichment problem and systematically compare the performance of selected tools. We compared seven tools, representing diverse methodological approaches, to study the influence of data preprocessing and parameter choice on the quality of the solutions. In our analysis, BioNet and KPM often performed best, presumably because these tools leave the preprocessing (computation of p-values from expression data, for instance) to the user. Our results show that, in absence of a well-defined gold standard, using two or three different tools can provide a comprehensive picture of the information content of the data. In the future, we hope to extend our framework to a platform for meta or ensemble analysis using several tools in conjunction. Such a method could, for instance, combine results from different tools into concatenated (sub-)networks. The framework presented here allows for the first standardized, well-structured and comprehensive evaluation of de novo pathway enrichment methods. However, the design of an optimal benchmark scheme relies on reasonable model assumptions and is a challenging task. Our conclusions are limited by nature, as we have no a priori gold standard datasets. While our knowledge of disease relevant genes continuously improves, we still lack sets of non-relevant genes in certain diseases. Nevertheless, we may assume that disease genes are generally closer to each other in a molecular biological network,[24] and that their expression (in general) is differentially distributed compared to non-disease genes. Considering this, we believe to offer a reasonable test scenario for de novo pathway enrichment methods. However, we acknowledge that existing methods are diverse in methodology as well as in their expected input datatypes and formats. As their performance and ranking varies greatly with their IP and the characteristics of the expression datasets, all rankings should be carefully interpreted. While simulated data are suitable for objective comparisons, we acknowledge that we cannot fully model the complexity of biological systems. A promising way forward would be to use large-scale perturbation experiments, such as shRNA or CRISPR knockdown/knockout studies, in a benchmark.

Methods

Candidate tool selection

For our analysis, we selected methods available as standalone applications, a prerequisite for a systematic and automated analysis across different datasets and parameters. We further restricted our comparison to tools designed for pathway enrichment using gene expression data, since it is the most common application type. The following seven tools fulfill both inclusion criteria: (a) BioNet[25] is an aggregate score optimization method that implements an efficient integer linear programming approach to compute optimal sub-networks. (b) COSINE[26] is a score propagation method that includes not only nodes but also edges in its scoring function. Edges correspond to differential gene–gene co-expression across different groups. (c) DEGAS[27] is a module cover approach that uses a shortest path heuristic to identify sub-networks with at least K “active” nodes, with “active” being defined via the proportion of active cases in the experimental data. (d) GiGA[28] is another score propagation method that first computes local minima to serve as starting points for iteratively building sub-networks with n members and a maximal rank of m. (e) GXNA[29] is a score propagation method that selects random nodes as seeds of candidate sub-networks. These are iteratively extended by adding the neighboring node with the highest score. (f) KPM[30] is another module cover approach inspired by DEGAS. It heuristically extracts maximal connected sub-networks, where all nodes except for at most K are differentially expressed in a given set of samples but at most L. (g) PinnacleZ[31] is an aggregate score propagation method that extends a set of random seed nodes to a number of sub-networks, which are then filtered by comparing their score distribution with that of randomly generated sub-networks. We provide more details for each of these tools in the supplementary material and briefly summarize some application cases. Please note that due to unavailability of a standalone application, clustering-based approaches could not be considered in our analysis.

The network data

The human protein–protein interaction (PPI) networks from the HPRD[22] and the I2D[23] were selected as test networks. The HPRD network consists of 9425 nodes and 36,811 edges. I2D network consists of 17,351 nodes and 217,379 edges. The networks could be modeled as undirected graphs G consisting of a set of vertices V and a set of edges E. To gain insights into how network complexity affects the performance of the selected tools, we chose two networks that differ largely in network complexity, e.g. recent interaction networks such as I2D are orders of magnitude larger than classical PPI networks such as HPRD.

Simulation of gene expression datasets

For each graph G, we split the nodes into FG and BG genes. Next, we simulated expression profiles for a set of 110 samples, 100 patient samples and 10 controls. Please note, we have chosen an arbitrary number of samples in each class. This choice can also be guided by (a) random sampling or (b) GEO GDS datasets, in order to reflect the median-experiment size in real case studies. We lack it in our current study due to limited time and computational power. We will work on this in the future, even though we expect the main findings of our study to remain unaffected. We divided the simulation into three cases: (i) of patient-FG, (ii) of control-FG, (iii) of BG for patients and controls. We used the following models for the simulation:

(a) Varying variance

We simulated the expression profiles for the patient-FG genes and—BG genes from normal distributions N(μ, v BG) ∧ N(μ, v BG) respectively (Fig. 4a). We used the same distribution for (ii) and (iii). μ = 0 was used in all cases. Signal strength was defined by the ratio v FG, v BG. See Supplementary Table 3 for the specific values selected for variances v FG, v BG in each scenario. Illustration of the used models for FG and BG expression distributions generated for cases and control samples: VV in (a) and VM in (b)

(b) Varying mean

In this case, we simulated the expression profile for the FG genes from a normal distribution N(μ FG, v), while for the expression values for BG genes we used N(μ = 0, v) and N(μ FGC, v) for control-FG (Fig. 4b). Here, we used the setting v = 1 in all cases. Signal strength was defined by the difference μ FG−μ FGC. See Supplementary Table 2 for the specific values selected for means μ FG, μ FGC. We further transformed the raw data into tool specific input, (e.g. p-values), the particular pre-processing steps for each tool are specified in the supplementary material.

Selection of FG and BG nodes

We used two approaches for splitting a given graph G = (V, E) into FG and BG nodes.

Seed and extend approach

This is a simple approach, in which we (a) randomly selected a seed node s, (b) expanded the sub-graph by including the nodes from the neighborhood of s, and (c) continued expanding the neighborhood until the sub-graph s had n = 20 nodes or until the number of nodes in the sub-graph remains the same as i−1 iteration. If the sub-graph exceeds n, we chose n−1 neighbors randomly from the set of neighbors.

AVD

In this approach, the sparsity is modeled for a set of FG nodes in a graph as the average pairwise shortest distance between them. By generating FG node sets of varying distance k, the sparseness of the sub-networks (or closeness of the FG genes) could be adjusted. Given a network modeled as graph G and a sparsity level k, we define the problem of selecting the FG nodes, as the average distance k ± α problem, or AVD for short. The goal was to identify a set of nodes from W ⊆ V such that the average shortest distance between all nodes of W have an average shortest distance (in G) to each other of k ± α. If not stated otherwise, we set α = 1. Our sparseness function ADV3, for instance, would thus return a set of nodes (which we will later on treat as FG nodes) having, on average, a pairwise shortest distance to each other between 2 and 4. We introduced α parameter to allow for inexactness and variability. Finding solutions for AVD is computationally difficult (NP-complete). We first tackled it by using integer-linear programming, but it proved too slow and memory intense for graphs with thousands of nodes and edges (see Supplementary Material). To deal with the size of current biological networks, we developed a greedy heuristic running in polynomial time to compute solutions for the AVD problem. The pseudo-code and additional details can be found in the supplementary material.

Availability

All scripts and data used in this study are available online at http://patheval.compbio.sdu.dk.

42 in total

1. Discovering regulatory and signalling circuits in molecular interaction networks.

Authors: Trey Ideker; Owen Ozier; Benno Schwikowski; Andrew F Siegel
Journal: Bioinformatics Date: 2002 Impact factor: 6.937

2. A novel signaling pathway impact analysis.

Authors: Adi Laurentiu Tarca; Sorin Draghici; Purvesh Khatri; Sonia S Hassan; Pooja Mittal; Jung-Sun Kim; Chong Jai Kim; Juan Pedro Kusanovic; Roberto Romero
Journal: Bioinformatics Date: 2008-11-05 Impact factor: 6.937

Review 3. Protein networks in disease.

Authors: Trey Ideker; Roded Sharan
Journal: Genome Res Date: 2008-04 Impact factor: 9.043

4. COSINE: COndition-SpecIfic sub-NEtwork identification using a global optimization method.

Authors: Haisu Ma; Eric E Schadt; Lee M Kaplan; Hongyu Zhao
Journal: Bioinformatics Date: 2011-03-16 Impact factor: 6.937

5. International network of cancer genome projects.

Authors: Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal: Nature Date: 2010-04-15 Impact factor: 49.962

6. Optimally discriminative subnetwork markers predict response to chemotherapy.

Authors: Phuong Dao; Kendric Wang; Colin Collins; Martin Ester; Anna Lapuk; S Cenk Sahinalp
Journal: Bioinformatics Date: 2011-07-01 Impact factor: 6.937

7. An integer linear programming approach for finding deregulated subgraphs in regulatory networks.

Authors: Christina Backes; Alexander Rurainski; Gunnar W Klau; Oliver Müller; Daniel Stöckel; Andreas Gerasch; Jan Küntzer; Daniela Maisel; Nicole Ludwig; Matthias Hein; Andreas Keller; Helmut Burtscher; Michael Kaufmann; Eckart Meese; Hans-Peter Lenhof
Journal: Nucleic Acids Res Date: 2011-12-30 Impact factor: 16.971

8. KeyPathwayMiner 4.0: condition-specific pathway analysis by combining multiple omics studies and networks with Cytoscape.

Authors: Nicolas Alcaraz; Josch Pauling; Richa Batra; Eudes Barbosa; Alexander Junge; Anne G L Christensen; Vasco Azevedo; Henrik J Ditzel; Jan Baumbach
Journal: BMC Syst Biol Date: 2014-08-19

9. Detecting disease associated modules and prioritizing active genes based on high throughput data.

Authors: Yu-Qing Qiu; Shihua Zhang; Xiang-Sun Zhang; Luonan Chen
Journal: BMC Bioinformatics Date: 2010-01-13 Impact factor: 3.169

10. Graph-based iterative Group Analysis enhances microarray interpretation.

Authors: Rainer Breitling; Anna Amtmann; Pawel Herzyk
Journal: BMC Bioinformatics Date: 2004-07-23 Impact factor: 3.169

17 in total

1. Network module identification-A widespread theoretical bias and best practices.

Authors: Iryna Nikolayeva; Oriol Guitart Pla; Benno Schwikowski
Journal: Methods Date: 2017-09-21 Impact factor: 3.608

Review 2. Molecular networks in Network Medicine: Development and applications.

Authors: Edwin K Silverman; Harald H H W Schmidt; Eleni Anastasiadou; Lucia Altucci; Marco Angelini; Lina Badimon; Jean-Luc Balligand; Giuditta Benincasa; Giovambattista Capasso; Federica Conte; Antonella Di Costanzo; Lorenzo Farina; Giulia Fiscon; Laurent Gatto; Michele Gentili; Joseph Loscalzo; Cinzia Marchese; Claudio Napoli; Paola Paci; Manuela Petti; John Quackenbush; Paolo Tieri; Davide Viggiano; Gemma Vilahur; Kimberly Glass; Jan Baumbach
Journal: Wiley Interdiscip Rev Syst Biol Med Date: 2020-04-19

3. De novo identification of maximally deregulated subnetworks based on multi-omics data with DeRegNet.

Authors: Sebastian Winkler; Ivana Winkler; Mirjam Figaschewski; Thorsten Tiede; Alfred Nordheim; Oliver Kohlbacher
Journal: BMC Bioinformatics Date: 2022-04-19 Impact factor: 3.307

4. NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks.

Authors: Matthew A Reyna; Uthsav Chitra; Rebecca Elyanow; Benjamin J Raphael
Journal: J Comput Biol Date: 2021-01-05 Impact factor: 1.479

Review 5. Multi-omics integration in biomedical research - A metabolomics-centric review.

Authors: Maria A Wörheide; Jan Krumsiek; Gabi Kastenmüller; Matthias Arnold
Journal: Anal Chim Acta Date: 2020-10-22 Impact factor: 6.558

6. A systems pharmacology approach based on oncogenic signalling pathways to determine the mechanisms of action of natural products in breast cancer from transcriptome data.

Authors: Regan Odongo; Asuman Demiroglu-Zergeroglu; Tunahan Çakır
Journal: BMC Complement Med Ther Date: 2021-06-30