Literature DB >> 24587057

Stability indicators in network reconstruction.

Michele Filosi¹, Roberto Visintainer², Samantha Riccadonna³, Giuseppe Jurman², Cesare Furlanello².

Abstract

The number of available algorithms to infer a biological network from a dataset of high-throughput measurements is overwhelming and keeps growing. However, evaluating their performance is unfeasible unless a 'gold standard' is available to measure how close the reconstructed network is to the ground truth. One measure of this is the stability of these predictions to data resampling approaches. We introduce NetSI, a family of Network Stability Indicators, to assess quantitatively the stability of a reconstructed network in terms of inference variability due to data subsampling. In order to evaluate network stability, the main NetSI methods use a global/local network metric in combination with a resampling (bootstrap or cross-validation) procedure. In addition, we provide two normalized variability scores over data resampling to measure edge weight stability and node degree stability, and then introduce a stability ranking for edges and nodes. A complete implementation of the NetSI indicators, including the Hamming-Ipsen-Mikhailov (HIM) network distance adopted in this paper is available with the R package nettools. We demonstrate the use of the NetSI family by measuring network stability on four datasets against alternative network reconstruction methods. First, the effect of sample size on stability of inferred networks is studied in a gold standard framework on yeast-like data from the Gene Net Weaver simulator. We also consider the impact of varying modularity on a set of structurally different networks (50 nodes, from 2 to 10 modules), and then of complex feature covariance structure, showing the different behaviours of standard reconstruction methods based on Pearson correlation, Maximum Information Coefficient (MIC) and False Discovery Rate (FDR) strategy. Finally, we demonstrate a strong combined effect of different reconstruction methods and phenotype subgroups on a hepatocellular carcinoma miRNA microarray dataset (240 subjects), and we validate the analysis on a second dataset (166 subjects) with good reproducibility.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
MicroRNAs

Year: 2014 PMID： 24587057 PMCID： PMC3937450 DOI： 10.1371/journal.pone.0089815

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The problem of inferring a biological network structure given a set of high-throughput measurements, e.g. gene expression arrays, has been addressed by a large number of different methods published in the last fifteen years (see [1], [2] for two recent comparative reviews). Solutions range from general purpose algorithms (such as correlation [3] or relevance networks [4]) to methods tailored ad hoc for specific data types. Recent examples include SeqSpider [5] for Next Generation Sequencing data, or Sparsity-aware Maximum Likelihood [6] for cis-expression quantitative trait loci (cis-eQTL). However, network reconstruction is an underdetermined problem, since the number of interactions is significantly larger than the number of independent measurements [7]. Thus, all algorithms must aim to find a compromise between reconstruction accuracy and feasibility: simplifications inevitably detract from the precision of the final outcome by including a relevant number of false positive links [8], which should be discarded e.g., by identifying and removing unwanted indirect relations [9]. Moreover, inference accuracy is strongly dependent on the assumptions used to choose the best hypothetical model of experimental observations [10]. These issues make the inference problem “a daunting task” [11] not only in terms of devising an effective algorithm, but also in terms of quantitatively interpreting the results obtained. In general, reconstruction accuracy is far from optimal in many situations and several pitfalls may occur [12], related to both the methods and the data [13]. In extreme cases, many link predictions are statistically equivalent to random guesses [14]. In particular, it is now widely acknowledged that the size and quality of the data play a critical role in the inference process [15]-[18]. All these considerations support the opinion that network reconstruction should still be regarded as an unsolved problem [19]. Given the growing list of available algorithms, efforts have been made to develop methods for the objective comparison of network inference methods including the identification of current limitations [20], [21] and their relative strengths and disadvantages [7], [22]. The most systematic effort is probably the international DREAM challenge [23]: from DREAM 2012 emerged a consensus advocating the integration of predictions from multiple inference methods as an effective strategy to enhance performance [24]. However, algorithm uncertainty has so far been assessed only in terms of performance, i.e., the distance of the reconstructing network from the ground truth, whenever available, while the stability of the methods has been neglected. When no gold standard is available for a given problem, there is no chance to evaluate algorithm accuracy. In such cases we can consider stability as a rule of thumb for judging the reliability of the resulting network. Obviously, the performance of a network reconstruction algorithm and the stability/reliability of the resulting network inferred from a specific dataset are two distinct and equally crucial aspects of the network inference process. The best way to optimize both aspects would be to adopt only network reconstruction algorithms with well characterized performance, i.e., evaluated in cases where the ground truth is known, and with stability always checked on specific data. It is also worthwhile noting that the evaluation of inference stability is not related to the (chemical or physical) “stability” of the represented process. We propose to tackle the stability issue by quantifying inference variability with respect to data perturbation, and, in particular, data subsampling. If a portion of data is randomly removed before inferring the network, the resulting graph is likely to be different from the one reconstructed from the whole dataset and, in general, different subsets of data would give rise to different networks. Thus, in the spirit of applying reproducibility principles to this field, one has to accept the compromise that the inferred/non inferred links are just an estimation, lying within a reasonable probability interval. Here we introduce the Network Stability Indicators (NetSI) family, a set of four indicators allowing the researcher to quantitatively evaluate the reproducibility of the reconstruction process. We propose to quantitatively assess, for a given ratio of removed data and for a given amount of (bootstrap [25] or cross-validation) resampling, the mutual distances among all inferred networks and their distances to the network generated by the whole dataset, with the idea that, the smaller the average distance, the more stable the inferred network. Similarly, we propose two indicators for the distribution of variability of the link weight and node degree across the generated networks, providing a ranked list of the most stable links and nodes, the least variable being the top ranked. The described framework for evaluating the stability of the whole network obviously relies on a network distance, but it is independent from the chosen metric. As network distance we use the Hamming-Ipsen-Mikhailov (HIM) distance [26], or its components for demonstration purposes, because it represents a good compromise between local (link-based) and global (structure-based) measures of network comparison. Moreover, the HIM distance can be easily included in pipelines for network analysis [27]. We first show the effect of network modularity and the dataset sample size on both the stability and the accuracy of the network inference process. For this purpose, we create two synthetic datasets with a known gold standard. The results are demonstrated for several inference algorithm, such as the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE), developed for the reconstruction of gene regulatory networks [28], the Context Likelihood of Relatedness (CLR) approach [29] and the Weighted Gene Correlation Network Analysis (WGCNA) [30]. Then the NetSI indicators are computed on correlation networks developed on another ad hoc synthetic dataset. We highlight the difference in terms of stability due to the choice of the inference algorithm: two basic correlation measures and the impact of a permutation-based False Discovery Rate (FDR) filter. Finally, we show the use of NetSI measures in a typical application, comparing the stability of relevance networks inferred on a miRNA microarray dataset with paired tissues extracted from a cohort of 241 hepatocellular carcinoma patients [31], [32]. The data exhibit two phenotypes, one related to disease (tumoral or non-tumoral tissues) and one to patient gender (male or female); we show that four different networks are obtained, each of different stability, and that the reconstruction method is a serious source of variability with the smaller data subgroups. Finally we validate the analysis on a second hepatacellular carcinoma dataset (166 subjects) with good reproducibility. All the methods (HIM distance and NetSI indicators) have been implemented in the open source R package nettools for the CRAN archives, as well as on GitHub at the address https://github.com/MPBA/nettools.git. For computing efficiency, the software can be used on multicore workstations and on high performance computing (HPC) clusters. Further technical details and preliminary experiments with nettools are available in [33].

Methods

Before defining the NetSI family we briefly summarize the main definitions and properties of the HIM network distance. Moreover, at the end of this section, we provide a short description of the network inference approaches used in the following experiments.

HIM network distance

The HIM distance [26] is a metric for network comparison combining an edit distance (Hamming [34], [35]) and a spectral one (Ipsen-Mikhailov [36]). As discussed in [37], edit distances are local, i.e. they focus only on the portions of the network interested by the differences in the presence/absence of matching links. Spectral distances evaluate instead the global structure of the compared topologies, but they cannot distinguish isomorphic or isospectral graphs, which can correspond to quite different conditions within the biological context. Their combination into the HIM distance represents an effective solution to the quantitative evaluation of network differences. Let and be two simple networks on nodes, described by the corresponding adjacency matrices and , with , where for unweighted graphs and for weighted networks. Denote then by the identity matrix , by the unitary matrix with all entries equal to one and by the null matrix with all entries equal to zero. Finally, denote by the empty network with nodes and no links (with adjacency matrix ) and by the undirected full network with nodes and all possible links (whose adjacency matrix is ). The definition of the Hamming distance is the following: To guarantee independence from the network dimension (number of nodes), we normalize the above function by the factor : When and are unweighted networks, is just the fraction of different matching links (over the total number of possible links) between the two graphs. In all cases, , where the lower bound is attained only for identical networks and the upper bound is reached whenever the two networks are complementary . Among spectral distances, we consider the Ipsen-Mikhailov distance IM which has been proven to be the most robust in a wide range of situations [37], [38]. Originally introduced in [36] as a tool for network reconstruction from its Laplacian spectrum, the definition of the Ipsen-Mikhailov metric follows the dynamical interpretation of a –nodes network as a –atoms molecule connected by identical elastic strings, where the pattern of connections is defined by the adjacency matrix of the corresponding network. In particular the connections between nodes in the network correspond to the bonds between atoms in the dynamical system and the adjacency matrix is its topological description. We summarize here the mathematical details of the IM definition [36]. The dynamical properties of the oscillatory system are described by the set of differential equations where are the coordinates of the physical molecules. Since the adjacency matrix depends on the node labeling, we consider instead the Laplacian matrix , which for an undirected network is defined as the difference between the degree matrix (the diagonal matrix with vertex degrees as entries) and : . is positive semidefinite and singular [39]–[42], and its set of eigenvalues , i.e. the spectra of the associated graph, provide the natural vibrational frequencies for the system modeled in Eq. 2: , with . The spectral density for a graph can be written as the sum of Lorentz distributions where is the common width and is the normalization constant defined as so that . The scale parameter specifies the half-width at half-maximum, which is equal to half the interquartile range. From the above definitions, the spectral distance between two graphs and with densities and can then be defined as The highest value of is reached, for a given number of nodes , when evaluating the distance between and . Defining as the (unique [26]) solution of we can now define the normalized Ipsen-Mikahilov distance IM as so that with the upper bound attained only for . Finally, the generalized HIM distance is defined by the one-parameter family of product metrics linearly combining with a factor the normalized Hamming distance H and the normalized Ipsen-Mikhailov IM distance, further normalized by the factor to set its upper bound to 1: Obviously, and . For example, the flexibility introduced by can be used to focus attention more on structure than on local editing changes when is used to generate a kernel function for classification tasks (e.g. on brain networks). In what follows we will mostly deal with the case , and omit the subscript for brevity. The relative effect of the two components is exemplified in Fig. 1A-D. The three small size networks (5 nodes) in Fig. 1A differ from each other in only two edges but and are isomorphic and diverse from , as correctly picked up by the HIM distance (see table in Fig. 1C). Similarly, HIM, H and IM provide different values when four edges are cut from on the larger (50 nodes) network, at different levels of the graph structure. Larger effects are caused by the elimination of the four red edges connecting the four submodules with differences up to 10 times larger for IM with respect to H (see table in Fig. 1D).

Figure 1

HIM distance: contribution of H and IM.

HIM distance: contribution of H and IM.

(A) An example on three 5-node networks mutually differing by two links. (B) An example on network , as defined in Subsection Stability is modularity invariant. : network without the four red links. : network without green links. : network without blue links. (C) The mutual differences between the pairs of networks in (A), and . (D) , , . In both cases they have the same Hamming distance but different spectral structure, thus resulting in different Ipsen-Mikhailov distances. The HIM distance can be represented in the Hamming/Ipsen-Mikhailov space, where a point represents the distance between two networks and whose coordinates are and and the norm of is times the HIM distance . The same holds for weighted networks, provided that the weights range over . In Fig. 2 we provide an example of this representation by evaluating the HIM distance between networks of four nodes, namely networks A, B, E (empty) and F (full) in the left panel of Fig. 2. If the Hamming/Ipsen-Mikhailov space is roughly split into four quadrants I, II, III, and IV, then two networks whose distance is mapped in quadrant I are close both in terms of matching links and of structure, while those falling in quadrant III differ with respect to both characteristics. Networks corresponding to a point in quadrant II have many common links, but different structures, while a point in quadrant IV indicates two networks with few common links, but with similar structure.

Figure 2

An example of HIM distance.

An example of HIM distance.

Representation of the HIM distance in the Ipsen-Mikhailov (IM axis) and Hamming (H axis) distance space between networks A versus B, E and F, where E is the empty network and F is the fully connected one. Full mathematical details about the HIM distance and its two components H and IM can be found in [26].

The Network Stability Indicators (NetSI)

The mathematical and operational definition of the four NetSI indicators are introduced in Fig. 3. The first two are the stability indicators and the internal stability indicator , which concern the stability of the whole reconstructed network. The former measures the distances between the network inferred on the whole dataset against the networks inferred from the resampled subsets. The latter measures all the mutual distances within the networks inferred from the resampled subsets. The other two indicators, the edge weight stability indicator and the node degree stability indicator , concern instead the stability of the single links and nodes, in terms of mutual variability of their respective weight and degree. In all cases, smaller indicator values correspond to more stable objects.

Figure 3

Definition of the NetSI family.

We adopt , except for the first experiment where we show also the stability for and . As the HIM distance is defined also on directed networks, the extension of the NetSI family to the directed case is straightforward. A graphical representation of the procedure is provided in Fig. 4. For all experiments reported in this paper, we used , (leave-one-out stability, LOO for short), and different instances of -fold cross validation (discarding the test portion) for (, and ), and thus and .

Figure 4

Graphical description of the pipeline in Fig. 3.

Using the inference algorithm , the network is first reconstructed from the whole dataset with samples and features (nodes). Given two integers , a set of datasets is generated by choosing for each a subset of samples from , and the corresponding networks are inferred by . Finally, the four indicators , , and are computed according to their definition.

Graphical description of the pipeline in Fig. 3.

Stability of network inference algorithms

As a first application, we test the difference in stability of the reconstruction process for a set of alternative network co-expression inference algorithms. The most famous representative of the correlation-based approaches is surely the Weighted Gene Correlation Network Analysis (WGCNA) [30], [43]. In this case the co-expression similarity is defined as a function of the absolute correlation. We adopt as similarity score: (i) the simple absolute Pearson correlation (labelled as “cor”), (ii) a more sophisticated version with soft-thresholding, i.e., the similarity is defined as a power of the absolute correlation (we adopt the default value six as in the WGCNA R package), or (iii) the biweight midcorrelation (“bicor” for short) [30], [44], which is more robust to outliers than the Pearson correlation, and (iv) the Maximal Information Coefficient (labeled as MIC). MIC is a recent association measure based on mutual information and belongs to the Maximal Information-based Nonparametric Exploration (MINE) statistics [44]–[48]. In all cases we obtain a weighted network with link strength ranging from 0 to 1. The Topological Overlap Measure (TOM) replaces the original co-expression similarity with a measure of interconnectedness (between pairs of nodes) based on shared neighbors [30], [43]. TOM can be seen as a filter for cutting away weak connections, thus leading to more robust networks than WGCNA. The Context Likelihood of Relatedness (CLR) approach [29] scores the interactions by using the mutual information between the corresponding gene expression levels, coupled with an adaptive background correction step. Although suboptimal if the number of nodes is much larger than the number of variables, it was observed that CLR performs well in terms of prediction accuracy and some CLR predictions in literature were recently validated experimentally [49]. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) is another approach relying on mutual information, which was originally developed for inferring regulatory networks of mammalian cells [28]. It starts with a graph where each pair of nodes are connected if their association is above a chosen threshold. In order to avoid the false positive problem, that usually affects co-expression networks, we then apply the Data Processing Inequality (DPI) procedure for removing the weakest edge of each triplet, thus pruning the majority of undirected links. A unique interface to all the mentioned algorithms is integrated in the stability analysis tools in the nettools package, based on their Bioconductor and CRAN implementations: minet for ARACNE and CLR, WGCNA for WGCNA, TOM and bicor, and minerva for MIC.

Results and Discussion

Stability is modularity invariant

We demonstrate the invariance of the NetSI family with respect to network modularity in a controlled situation. We show that the proposed stability evaluation framework is not affected by various network structures for nine reconstruction algorithms. Moreover, we demonstrate that this property is maintained both if we adopt the HIM metric for the indicator computation and we use the two components H and IM separately.

Data generation

We created a set of networks with 50 nodes each with (where ranges from 1 to 10) fully connected subgroups, which are linked to each other with a single edge. For we obtain a fully connected network (without loops), while the resulting networks for are displayed in Fig. 5. For each network we report its modularity value and density in Tab. 1.

Figure 5

Synthetic network with modules, where ranges from 2 to 10 from top left to bottom right.

Table 1

Modularity and density values for 50-nodes networks () for an increasing number of modules .

	Modularity	Density
1	0.00	1.00
2	0.50	0.49
3	0.66	0.32
4	0.73	0.24
5	0.78	0.19
6	0.80	0.16
7	0.81	0.13
8	0.82	0.11
9	0.81	0.10
10	0.81	0.09

The simulated gene expression values corresponding to the networks are generated loading the corresponding adjacency matrices in the Gene Net Weaver (GNW) simulator [50]. Specifically, the tool is used to create of simulated transcription datasets after a random initialization of each network's regulatory dynamics through a pre-loaded kinetic model [23]. Moreover it is possible to generate a steady-state dataset or a set of time series, which describes the network response to a perturbation, followed by perturbation removal until the steady state is reached. Thus, we chose to generate in one shot 50 time-series (one for each sample) with default parameter settings and to consider only the initial time point, since corresponds to the wild-type steady state. Summarizing, we generated 10 synthetic datasets having a simulated expression level for 50 “genes” and 50 “samples”.

Results

We inferred networks from the 10 datasets with nine algorithms: ARACNE, CLR, cor, TOM, bicor, WGCNA and MIC, where the last two were also used with a permutation-based FDR filter (for details, see Subsection “FDR control effect on correlation networks”). The stability analysis with three possible network metrics (HIM, H and IM) on networks inferred with the nine mentioned approaches is reported in Fig. 6. In all cases, the stability varies less than 0.06 across different modularity values, as detailed in Tab. 2. Hence, the stability indicator is not affected by different modular structures. However, reconstruction accuracy depends on modularity (or density), as shown by a comparison with the gold standard (Fig. 7), in which a lower distance from the gold standard is found for sparser networks for all methods.

Figure 6

networks: Stability of synthetic networks for different modularity levels.

Table 2

networks: range of for different reconstruction algorithms and .

	variation range
algorithm
WGCNA	0.01	0.01	0.00
WGCNAFDR	0.02	0.03	0.00
cor	0.02	0.03	0.00
MIC	0.01	0.02	0.00
MICFDR	0.04	0.06	0.00
ARACNE	0.02	0.03	0.01
bicor	0.02	0.03	0.00
TOM	0.01	0.01	0.00
CLR	0.03	0.05	0.01

Figure 7

networks: distance between gold standard (HIM) and inferred synthetic networks for different modularity levels.

Inference on synthetic yeast-like networks

We investigated the behavior of the NetSI stability indicators for different sample sizes on a yeast-like dataset, again simulated by GNW.

Data Description

We considered a subnetwork of the Yeast transcriptional regulatory network available in GNW, namely the InSilicoSize100-Yeast2 dataset with 100 nodes, originally a DREAM3 benchmark, generating 100 samples with default parameter configuration, including noise level, for wild-type steady state (the synthetic dataset ). We randomly extracted 10 subsets of different sample size in , replicating the subset extraction procedure 50 times for each . For each combination of resampling, we inferred the network with the same nine algorithms used in the previous experiment. As a general trend, stability decreases for larger sample size (see Fig. 8). The stability curves for the two popular methods ARACNE and CLR drop quickly after 20% of the sample size, improving over Pearson and bicor. TOM and WGCNA are more stable but require at least 50% of the data. The standard MIC-based method with the default parameter () is much smoothed by the FDR correction. Overall, the FDR corrected methods are the most stable even for small samples. TOM and WGCNA have the best internal stability (Fig. 9), followed by the FDR-corrected methods.

Figure 8

Yeast-like simulated data: effect of increasing sample size on network reconstruction stability .