| Literature DB >> 35165295 |
Léo P M Diaz1, Michael P H Stumpf2.
Abstract
Network inference is a notoriously challenging problem. Inferred networks are associated with high uncertainty and likely riddled with false positive and false negative interactions. Especially for biological networks we do not have good ways of judging the performance of inference methods against real networks, and instead we often rely solely on the performance against simulated data. Gaining confidence in networks inferred from real data nevertheless thus requires establishing reliable validation methods. Here, we argue that the expectation of mixing patterns in biological networks such as gene regulatory networks offers a reasonable starting point: interactions are more likely to occur between nodes with similar biological functions. We can quantify this behaviour using the assortativity coefficient, and here we show that the resulting heuristic, functional assortativity, offers a reliable and informative route for comparing different inference algorithms.Entities:
Year: 2022 PMID: 35165295 PMCID: PMC8844311 DOI: 10.1038/s41598-022-05402-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Description of inference algorithms compared.
| Algorithm | Description | References |
|---|---|---|
| Linear correlation | Measures the linear correlation between a pair of random variables | [ |
| Rank correlation | Measures the rank correlation between a pair of random variables | [ |
| MI | Measures dependency between variables using the mutual information, that is the sum of the entropy of the variables minus their joint entropy; it represents the amount of information about one variable when another variable is known | [ |
| CLR | Based on the value of the MI between pairs of variables in the context of MI scores for each possible combination of variable pairs. This approach is referred to as | [ |
| PUC | Based on the mean unique information between variable pairs that accounts for their MI, as calculated via the partial information for each possible variables triplet for a given pair | [ |
| PIDC | Builds on the PUC approach by taking the network context into account in a similar way that CLR does i.e. by considering the overall distribution of PUC values | [ |
| GENIE3 | Creates as many regression problems as the number of input genes, then uses random forests to infer edges and their nature (genes are considered putative TFs if setting them as nodes on the trees reduces the variance of the predicted output) | [ |
Figure 1Evolution of the FAC as a function of the number of edges in a relevance network where edges are introduced in the order implied by their score.
Figure 2A selection of Venn diagrams showing patterns of overlap between three given inference methods for relevance networks with 200 edges. Overlaps are according to the number of edges shared between the given inference methods. Large overlap can mean that the different methods detect the same signal, which does not necessarily mean that these are true edges. These diagrams thus provide an assessment of the concordance of the different inference methods.
Figure 3Illustrating the behaviour of the FAC under noisy conditions. Mean (solid line) and standard deviation from the mean (shaded area) of the FAC as pairs of edges are rewired at random (left column), and as nodes are randomly attributed a different annotation (central column)—each plot shows 1000 repeats. Right column: comparison of the observed FAC for networks with 200 edges (vertical line) against distributions of the FAC in 1000 random networks with 200 nodes; blue, orange, and red coloured bands respectively indicate one, two and three standard deviations from the mean.