Christof Seiler1,2,3, Anne-Maud Ferreira4, Lisa M Kronstad5,6,7, Laura J Simpson5,6, Mathieu Le Gars5,6, Elena Vendrame5,6, Catherine A Blish5,6,8, Susan Holmes4. 1. Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, The Netherlands. christof.seiler@maastrichtuniversity.nl. 2. Mathematics Centre Maastricht, Maastricht University, Maastricht, The Netherlands. christof.seiler@maastrichtuniversity.nl. 3. Department of Statistics, Stanford University, Stanford, USA. christof.seiler@maastrichtuniversity.nl. 4. Department of Statistics, Stanford University, Stanford, USA. 5. Immunology Program, Stanford University School of Medicine, Stanford, USA. 6. Department of Medicine, Stanford University School of Medicine, Stanford, USA. 7. Department of Microbiology and Immunology, Midwestern University, Downers Grove, USA. 8. Chan Zuckerberg Biohub, San Francisco, USA.
Abstract
BACKGROUND: Flow and mass cytometry are important modern immunology tools for measuring expression levels of multiple proteins on single cells. The goal is to better understand the mechanisms of responses on a single cell basis by studying differential expression of proteins. Most current data analysis tools compare expressions across many computationally discovered cell types. Our goal is to focus on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. RESULTS: Differential analysis of marker expressions can be difficult due to marker correlations and inter-subject heterogeneity, particularly for studies of human immunology. We address these challenges with two multiple regression strategies: a bootstrapped generalized linear model and a generalized linear mixed model. On simulated datasets, we compare the robustness towards marker correlations and heterogeneity of both strategies. For paired experiments, we find that both strategies maintain the target false discovery rate under medium correlations and that mixed models are statistically more powerful under the correct model specification. For unpaired experiments, our results indicate that much larger patient sample sizes are required to detect differences. We illustrate the CytoGLMM R package and workflow for both strategies on a pregnancy dataset. CONCLUSION: Our approach to finding differential proteins in flow and mass cytometry data reduces biases arising from marker correlations and safeguards against false discoveries induced by patient heterogeneity.
BACKGROUND: Flow and mass cytometry are important modern immunology tools for measuring expression levels of multiple proteins on single cells. The goal is to better understand the mechanisms of responses on a single cell basis by studying differential expression of proteins. Most current data analysis tools compare expressions across many computationally discovered cell types. Our goal is to focus on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. RESULTS: Differential analysis of marker expressions can be difficult due to marker correlations and inter-subject heterogeneity, particularly for studies of human immunology. We address these challenges with two multiple regression strategies: a bootstrapped generalized linear model and a generalized linear mixed model. On simulated datasets, we compare the robustness towards marker correlations and heterogeneity of both strategies. For paired experiments, we find that both strategies maintain the target false discovery rate under medium correlations and that mixed models are statistically more powerful under the correct model specification. For unpaired experiments, our results indicate that much larger patient sample sizes are required to detect differences. We illustrate the CytoGLMM R package and workflow for both strategies on a pregnancy dataset. CONCLUSION: Our approach to finding differential proteins in flow and mass cytometry data reduces biases arising from marker correlations and safeguards against false discoveries induced by patient heterogeneity.
Entities:
Keywords:
Generalized linear mixed models; Generalized linear models; High-dimensional cytometry
Authors: Petter Brodin; Vladimir Jojic; Tianxiang Gao; Sanchita Bhattacharya; Cesar J Lopez Angel; David Furman; Shai Shen-Orr; Cornelia L Dekker; Gary E Swan; Atul J Butte; Holden T Maecker; Mark M Davis Journal: Cell Date: 2015-01-15 Impact factor: 41.582
Authors: Lisa M Kronstad; Christof Seiler; Rosemary Vergara; Susan P Holmes; Catherine A Blish Journal: J Immunol Date: 2018-08-24 Impact factor: 5.422
Authors: Malgorzata Nowicka; Carsten Krieg; Lukas M Weber; Felix J Hartmann; Silvia Guglietta; Burkhard Becher; Mitchell P Levesque; Mark D Robinson Journal: F1000Res Date: 2017-05-26
Authors: Thanmayi Ranganath; Laura J Simpson; Anne-Maud Ferreira; Christof Seiler; Elena Vendrame; Nancy Zhao; Jason D Fontenot; Susan Holmes; Catherine A Blish Journal: Front Immunol Date: 2020-04-24 Impact factor: 7.561
Authors: Sonwabile Dzanibe; Katie Lennard; Agano Kiravu; Melanie S S Seabrook; Berenice Alinde; Susan P Holmes; Catherine A Blish; Heather B Jaspan; Clive M Gray Journal: J Immunol Date: 2021-11-24 Impact factor: 5.422