Jarosław Chilimoniuk1,2, Alicja Gosiewska3, Jadwiga Słowik3, Romano Weiss2, P Markus Deckert4, Stefan Rödiger2,5, Michał Burdukiewicz2,6. 1. Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland. 2. Faculty of Natural Sciences, Brandenburg University of Technology Cottbus-Senftenberg, Senftenberg, Germany. 3. Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland. 4. Faculty of Medicine and Psychology, Brandenburg Medical School Theodor Fontane, and Faculty of Health Sciences Brandenburg, Brandenburg Medical School Theodor Fontane, Brandenburg, Germany. 5. Faculty of Health Sciences Brandenburg, Brandenburg University of Technology Cottbus-Senftenberg, Senftenberg, Germany. 6. Medical University of Białystok, Białystok, Poland.
Abstract
BACKGROUND: DNA double-strand breaks can be counted as discrete foci by imaging techniques. In personalized medicine and pharmacology, the analysis of counting data is relevant for numerous applications, e.g., for cancer and aging research and the evaluation of drug efficacy. By default, it is assumed to follow the Poisson distribution. This assumption, however, may lead to biased results and faulty conclusions in datasets with excess zero values (zero-inflation), a variance larger than the mean (overdispersion), or both. In such cases, the assumption of a Poisson distribution would skew the estimation of mean and variance, and other models like the negative binomial (NB), zero-inflated Poisson or zero-inflated NB distributions should be employed. The model chosen has an influence on the parameter estimation (mean value and confidence interval). Yet the choice of the suitable distribution model is not trivial. METHODS: To support, simplify and objectify this process, we have developed the countfitteR software as an R package. We used a Bayesian approach for distribution model selection and the shiny web application framework for interactive data analysis. RESULTS: We show the application of our software based on examples of DNA double-strand break count data from phenotypic imaging by multiplex fluorescence microscopy. In analyzing numerous datasets of molecular pharmacological markers (phosphorylated histone H2AX and p53 binding protein), countfitteR demonstrated an equal or superior statistical performance compared to the usually employed two-step procedure, with an overall power of up to 98%. In addition, it still gave information in cases with no result at all from the two-step procedure. In our data sample we found that the NB distribution was the most frequent, with the Poisson distribution taking second place. CONCLUSIONS: countfitteR can perform an automated distribution model selection and thus support the data analysis and lead to objective statistically verifiable estimated values. Originally designed for the analysis of foci in biomedical image data, countfitteR can be used in a variety of areas where non-Poisson distributed counting data is prevalent. 2021 Annals of Translational Medicine. All rights reserved.
BACKGROUND: DNA double-strand breaks can be counted as discrete foci by imaging techniques. In personalized medicine and pharmacology, the analysis of counting data is relevant for numerous applications, e.g., for cancer and aging research and the evaluation of drug efficacy. By default, it is assumed to follow the Poisson distribution. This assumption, however, may lead to biased results and faulty conclusions in datasets with excess zero values (zero-inflation), a variance larger than the mean (overdispersion), or both. In such cases, the assumption of a Poisson distribution would skew the estimation of mean and variance, and other models like the negative binomial (NB), zero-inflated Poisson or zero-inflated NB distributions should be employed. The model chosen has an influence on the parameter estimation (mean value and confidence interval). Yet the choice of the suitable distribution model is not trivial. METHODS: To support, simplify and objectify this process, we have developed the countfitteR software as an R package. We used a Bayesian approach for distribution model selection and the shiny web application framework for interactive data analysis. RESULTS: We show the application of our software based on examples of DNA double-strand break count data from phenotypic imaging by multiplex fluorescence microscopy. In analyzing numerous datasets of molecular pharmacological markers (phosphorylated histone H2AX and p53 binding protein), countfitteR demonstrated an equal or superior statistical performance compared to the usually employed two-step procedure, with an overall power of up to 98%. In addition, it still gave information in cases with no result at all from the two-step procedure. In our data sample we found that the NB distribution was the most frequent, with the Poisson distribution taking second place. CONCLUSIONS: countfitteR can perform an automated distribution model selection and thus support the data analysis and lead to objective statistically verifiable estimated values. Originally designed for the analysis of foci in biomedical image data, countfitteR can be used in a variety of areas where non-Poisson distributed counting data is prevalent. 2021 Annals of Translational Medicine. All rights reserved.
Entities:
Keywords:
DNA damage; count data; overdispersion; zero-inflation
Authors: Volodymyr A Vinnikov; Elizabeth A Ainsbury; Nataliya A Maznyk; David C Lloyd; Kai Rothkamm Journal: Radiat Res Date: 2010-10 Impact factor: 2.841
Authors: Peter D Caie; Rebecca E Walls; Alexandra Ingleston-Orme; Sandeep Daya; Tom Houslay; Rob Eagle; Mark E Roberts; Neil O Carragher Journal: Mol Cancer Ther Date: 2010-06-08 Impact factor: 6.261
Authors: Katherine A Guthrie; Hilary S Gammill; Mads Kamper-Jørgensen; Anne Tjønneland; Vijayakrishna K Gadi; J Lee Nelson; Wendy Leisenring Journal: Am J Epidemiol Date: 2016-11-15 Impact factor: 4.897
Authors: P H Clingen; J Y-H Wu; J Miller; N Mistry; F Chin; P Wynne; K M Prise; J A Hartley Journal: Biochem Pharmacol Date: 2008-04-16 Impact factor: 5.858
Authors: Christophe E Redon; Asako J Nakamura; Olga A Martin; Palak R Parekh; Urbain S Weyemi; William M Bonner Journal: Aging (Albany NY) Date: 2011-02 Impact factor: 5.682
Authors: Tim Hulsen; Saumya S Jamuar; Alan R Moody; Jason H Karnes; Orsolya Varga; Stine Hedensted; Roberto Spreafico; David A Hafler; Eoin F McKinney Journal: Front Med (Lausanne) Date: 2019-03-01