Literature DB >> 28504821

Why you cannot transform your way out of trouble for small counts.

David I Warton1.   

Abstract

While data transformation is a common strategy to satisfy linear modeling assumptions, a theoretical result is used to show that transformation cannot reasonably be expected to stabilize variances for small counts. Under broad assumptions, as counts get smaller, it is shown that the variance becomes proportional to the mean under monotonic transformations g(·) that satisfy g(0)=0, excepting a few pathological cases. A suggested rule-of-thumb is that if many predicted counts are less than one then data transformation cannot reasonably be expected to stabilize variances, even for a well-chosen transformation. This result has clear implications for the analysis of counts as often implemented in the applied sciences, but particularly for multivariate analysis in ecology. Multivariate discrete data are often collected in ecology, typically with a large proportion of zeros, and it is currently widespread to use methods of analysis that do not account for differences in variance across observations nor across responses. Simulations demonstrate that failure to account for the mean-variance relationship can have particularly severe consequences in this context, and also in the univariate context if the sampling design is unbalanced.
© 2017 The Authors. Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society.

Keywords:  Community composition; Ecology; Generalized linear models; Multivariate analysis; Variance stabilizing transformation

Mesh:

Year:  2017        PMID: 28504821     DOI: 10.1111/biom.12728

Source DB:  PubMed          Journal:  Biometrics        ISSN: 0006-341X            Impact factor:   2.571


  7 in total

1.  Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data.

Authors:  Lélis A Carlos-Júnior; Joel C Creed; Rob Marrs; Rob J Lewis; Timothy P Moulton; Rafael Feijó-Lima; Matthew Spencer
Journal:  PeerJ       Date:  2020-09-03       Impact factor: 2.984

2.  Normalization of single-cell RNA-seq counts by log(x + 1)* or log(1 + x).

Authors:  A Sina Booeshaghi; Lior Pachter
Journal:  Bioinformatics       Date:  2021-03-02       Impact factor: 6.937

3.  Stock delineation of striped snakehead, Channa striata using multivariate generalised linear models with otolith shape and chemistry data.

Authors:  Salman Khan; Hayden T Schilling; Mohammad Afzal Khan; Devendra Kumar Patel; Ben Maslen; Kaish Miyan
Journal:  Sci Rep       Date:  2021-04-14       Impact factor: 4.379

4.  glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data.

Authors:  Constantin Ahlmann-Eltze; Wolfgang Huber
Journal:  Bioinformatics       Date:  2021-04-05       Impact factor: 6.937

5.  Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data.

Authors:  Jan Lause; Philipp Berens; Dmitry Kobak
Journal:  Genome Biol       Date:  2021-09-06       Impact factor: 13.583

Review 6.  Statistics or biology: the zero-inflation controversy about scRNA-seq data.

Authors:  Ruochen Jiang; Tianyi Sun; Dongyuan Song; Jingyi Jessica Li
Journal:  Genome Biol       Date:  2022-01-21       Impact factor: 13.583

7.  SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies.

Authors:  Jiaqiang Zhu; Shiquan Sun; Xiang Zhou
Journal:  Genome Biol       Date:  2021-06-21       Impact factor: 13.583

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.