| Literature DB >> 35923701 |
Deniz Seçilmiş1, Sven Nelander2, Erik L L Sonnhammer1.
Abstract
Accurate inference of gene regulatory networks (GRNs) is important to unravel unknown regulatory mechanisms and processes, which can lead to the identification of treatment targets for genetic diseases. A variety of GRN inference methods have been proposed that, under suitable data conditions, perform well in benchmarks that consider the entire spectrum of false-positives and -negatives. However, it is very challenging to predict which single network sparsity gives the most accurate GRN. Lacking criteria for sparsity selection, a simplistic solution is to pick the GRN that has a certain number of links per gene, which is guessed to be reasonable. However, this does not guarantee finding the GRN that has the correct sparsity or is the most accurate one. In this study, we provide a general approach for identifying the most accurate and sparsity-wise relevant GRN within the entire space of possible GRNs. The algorithm, called SPA, applies a "GRN information criterion" (GRNIC) that is inspired by two commonly used model selection criteria, Akaike and Bayesian Information Criterion (AIC and BIC) but adapted to GRN inference. The results show that the approach can, in most cases, find the GRN whose sparsity is close to the true sparsity and close to as accurate as possible with the given GRN inference method and data. The datasets and source code can be found at https://bitbucket.org/sonnhammergrni/spa/.Entities:
Keywords: gene expression data; gene regulatory network inference; information criteria; noise in gene expression; sparsity selection
Year: 2022 PMID: 35923701 PMCID: PMC9340570 DOI: 10.3389/fgene.2022.855770
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Workflow of SPA. SPA takes a set of inferred GRNs with varying sparsities, the measured gene expression in fold changes, and the perturbation design as input. It then uses the GRN Information Criterion (GRNIC) as described in Algorithm 1 and identifies the GRN that minimizes GRNIC as the best GRN.
FIGURE 2Performance evaluation of the sparsity selection pipeline in terms of the F1-score. F1-scores of the inferred GRNs from datasets generated by GeneNetWeaver with (A) high and (B) low noise levels. Each panel contains F1-scores from five datasets for two categories: GRNIC (circle) and maximum achieved in inference (star).
FIGURE 3Performance evaluation of the sparsity selection pipeline in terms of sparsity. The sparsity of the inferred GRNs (the GRN having the maximum F1-score, and the one selected by GRNIC) is shown in terms of the average number of links for (A) high and (B) low noise levels for the five networks. The sparsities of the five true GRNs are shown in an extra column to the right.