Florian Erhard1. 1. Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, 97078 Würzburg, Germany.
Abstract
Motivation: Fold changes from count based high-throughput experiments such as RNA-seq suffer from a zero-frequency problem. To circumvent division by zero, so-called pseudocounts are added to make all observed counts strictly positive. The magnitude of pseudocounts for digital expression measurements and on which stage of the analysis they are introduced remained an arbitrary choice. Moreover, in the strict sense, fold changes are not quantities that can be computed. Instead, due to the stochasticity involved in the experiments, they must be estimated by statistical inference. Results: Here, we build on a statistical framework for fold changes, where pseudocounts correspond to the parameters of the prior distribution used for Bayesian inference of the fold change. We show that arbitrary and widely used choices for applying pseudocounts can lead to biased results. As a statistical rigorous alternative, we propose and test an empirical Bayes procedure to choose appropriate pseudocounts. Moreover, we introduce the novel estimator Ψ LFC for fold changes showing favorable properties with small counts and smaller deviations from the truth in simulations and real data compared to existing methods. Our results have direct implications for entities with few reads in sequencing experiments, and indirectly also affect results for entities with many reads. Availability and implementation: Ψ LFC is available as an R package under https://github.com/erhard-lab/lfc (Apache 2.0 license); R scripts to generate all figures are available at zenodo (doi: 10.5281/zenodo.1163029). Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Fold changes from count based high-throughput experiments such as RNA-seq suffer from a zero-frequency problem. To circumvent division by zero, so-called pseudocounts are added to make all observed counts strictly positive. The magnitude of pseudocounts for digital expression measurements and on which stage of the analysis they are introduced remained an arbitrary choice. Moreover, in the strict sense, fold changes are not quantities that can be computed. Instead, due to the stochasticity involved in the experiments, they must be estimated by statistical inference. Results: Here, we build on a statistical framework for fold changes, where pseudocounts correspond to the parameters of the prior distribution used for Bayesian inference of the fold change. We show that arbitrary and widely used choices for applying pseudocounts can lead to biased results. As a statistical rigorous alternative, we propose and test an empirical Bayes procedure to choose appropriate pseudocounts. Moreover, we introduce the novel estimator Ψ LFC for fold changes showing favorable properties with small counts and smaller deviations from the truth in simulations and real data compared to existing methods. Our results have direct implications for entities with few reads in sequencing experiments, and indirectly also affect results for entities with many reads. Availability and implementation: Ψ LFC is available as an R package under https://github.com/erhard-lab/lfc (Apache 2.0 license); R scripts to generate all figures are available at zenodo (doi: 10.5281/zenodo.1163029). Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Marc G Chevrette; Chris S Thomas; Amanda Hurley; Natalia Rosario-Meléndez; Kris Sankaran; Yixing Tu; Austin Hall; Shruthi Magesh; Jo Handelsman Journal: Proc Natl Acad Sci U S A Date: 2022-10-10 Impact factor: 12.779
Authors: Ana Cristina Gonzalez-Perez; Markus Stempel; Emanuel Wyler; Christian Urban; Antonio Piras; Thomas Hennig; Sabina Ganskih; Yuanjie Wei; Albert Heim; Markus Landthaler; Andreas Pichlmair; Lars Dölken; Mathias Munschauer; Florian Erhard; Melanie M Brinkmann Journal: mBio Date: 2021-05-04 Impact factor: 7.867
Authors: Adam W Whisnant; Christopher S Jürges; Thomas Hennig; Emanuel Wyler; Bhupesh Prusty; Andrzej J Rutkowski; Anne L'hernault; Lara Djakovic; Margarete Göbel; Kristina Döring; Jennifer Menegatti; Robin Antrobus; Nicholas J Matheson; Florian W H Künzig; Guido Mastrobuoni; Chris Bielow; Stefan Kempa; Chunguang Liang; Thomas Dandekar; Ralf Zimmer; Markus Landthaler; Friedrich Grässer; Paul J Lehner; Caroline C Friedel; Florian Erhard; Lars Dölken Journal: Nat Commun Date: 2020-04-27 Impact factor: 14.919