Tobias Petri1, Stefan Altmann1, Ludwig Geistlinger1, Ralf Zimmer1, Robert Küffner2. 1. Ludwig-Maximilians-Universität München, Institut für Informatik, Munich, Germany and. 2. Ludwig-Maximilians-Universität München, Institut für Informatik, Munich, Germany and Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
Abstract
MOTIVATION: Experimentally determined gene regulatory networks can be enriched by computational inference from high-throughput expression profiles. However, the prediction of regulatory interactions is severely impaired by indirect and spurious effects, particularly for eukaryotes. Recently, published methods report improved predictions by exploiting the a priori known targets of a regulator (its local topology) in addition to expression profiles. RESULTS: We find that methods exploiting known targets show an unexpectedly high rate of false discoveries. This leads to inflated performance estimates and the prediction of an excessive number of new interactions for regulators with many known targets. These issues are hidden from common evaluation and cross-validation setups, which is due to Simpson's paradox. We suggest a confidence score recalibration method (CoRe) that reduces the false discovery rate and enables a reliable performance estimation. CONCLUSIONS: CoRe considerably improves the results of network inference methods that exploit known targets. Predictions then display the biological process specificity of regulators more correctly and enable the inference of accurate genome-wide regulatory networks in eukaryotes. For yeast, we propose a network with more than 22 000 confident interactions. We point out that machine learning approaches outside of the area of network inference may be affected as well. AVAILABILITY AND IMPLEMENTATION: Results, executable code and networks are available via our website http://www.bio.ifi.lmu.de/forschung/CoRe. CONTACT: robert.kueffner@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Experimentally determined gene regulatory networks can be enriched by computational inference from high-throughput expression profiles. However, the prediction of regulatory interactions is severely impaired by indirect and spurious effects, particularly for eukaryotes. Recently, published methods report improved predictions by exploiting the a priori known targets of a regulator (its local topology) in addition to expression profiles. RESULTS: We find that methods exploiting known targets show an unexpectedly high rate of false discoveries. This leads to inflated performance estimates and the prediction of an excessive number of new interactions for regulators with many known targets. These issues are hidden from common evaluation and cross-validation setups, which is due to Simpson's paradox. We suggest a confidence score recalibration method (CoRe) that reduces the false discovery rate and enables a reliable performance estimation. CONCLUSIONS: CoRe considerably improves the results of network inference methods that exploit known targets. Predictions then display the biological process specificity of regulators more correctly and enable the inference of accurate genome-wide regulatory networks in eukaryotes. For yeast, we propose a network with more than 22 000 confident interactions. We point out that machine learning approaches outside of the area of network inference may be affected as well. AVAILABILITY AND IMPLEMENTATION: Results, executable code and networks are available via our website http://www.bio.ifi.lmu.de/forschung/CoRe. CONTACT: robert.kueffner@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Qiang Zhang; W Michael Caudle; Jingbo Pi; Sudin Bhattacharya; Melvin E Andersen; Norbert E Kaminski; Rory B Conolly Journal: Curr Opin Toxicol Date: 2019-04-19
Authors: Breschine Cummins; Francis C Motta; Robert C Moseley; Anastasia Deckard; Sophia Campione; Marcio Gameiro; Tomáš Gedeon; Konstantin Mischaikow; Steven B Haase Journal: PLoS Comput Biol Date: 2022-10-10 Impact factor: 4.779
Authors: Pablo Augusto de Souza Fonseca; Samir Id-Lahoucine; Antonio Reverter; Juan F Medrano; Marina S Fortes; Joaquim Casellas; Filippo Miglior; Luiz Brito; Maria Raquel S Carvalho; Flávio S Schenkel; Loan T Nguyen; Laercio R Porto-Neto; Milton G Thomas; Angela Cánovas Journal: PLoS One Date: 2018-10-18 Impact factor: 3.240