Literature DB >> 33051653

CYBERTRACK2.0: zero-inflated model-based cell clustering and population tracking method for longitudinal mass cytometry data.

Kodai Minoura1,2, Ko Abe1, Yuka Maeda3, Hiroyoshi Nishikawa2,3, Teppei Shimamura1.   

Abstract

SUMMARY: Recent advancements in high-dimensional single-cell technologies, such as mass cytometry, enable longitudinal experiments to track dynamics of cell populations and identify change points where the proportions vary significantly. However, current research is limited by the lack of tools specialized for analyzing longitudinal mass cytometry data. In order to infer cell population dynamics from such data, we developed a statistical framework named CYBERTRACK2.0. The framework's analytic performance was validated against synthetic and real data, showing that its results are consistent with previous research.
AVAILABILITY AND IMPLEMENTATION: CYBERTRACK2.0 is available at https://github.com/kodaim1115/CYBERTRACK2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 33051653      PMCID: PMC8275981          DOI: 10.1093/bioinformatics/btaa873

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

High-dimensional single-cell technology, such as mass cytometry or cytometry by time-of-flight, provides the ability to investigate the expression patterns of pre-defined sets of surface and intracellular proteins at single cell resolutions (Spitzer and Nolan, 2016). Recently, longitudinal analysis using mass cytometry has yielded important information that cannot be obtained using conventional analysis of static time points. In the field of cancer immunity, mass cytometry analysis of tumor samples of the same patient at different time points is increasingly being utilized to better understand response and resistance to immune checkpoint blockade (Chen ; Greenplate ). For example, longitudinal mass cytometry analysis of paired peripheral blood biopsies from before and after anti-PD-1 treatment has revealed that the frequency of a certain monocyte subset was strongly associated with the patients’ responsiveness to the treatment (Krieg ). One of the main objectives of analyzing longitudinal cytometry data is to identify the underlying dynamics of cell populations and to track their temporal fluctuation. Recently, we proposed a Topic Tracking Model-based statistical framework named CYBERTRACK designed for analyzing longitudinal flow cytometry data (Iwata ; Minoura ). Although it is a powerful tool to discover cell population dynamics from such data, it has some limitations. One limitation is that it cannot be used to analyze mass cytometry data directly due to the high proportion of zeros in the data, so it does not follow the assumed probability distribution in CYBERTRACK. A zero in the mass cytometry data indicates that the number of metal isotopes was below the detection limit of the instrument, as the amount of marker protein expression in the cells was low. These zero values are typically substituted by random numbers to avoid computational problems occurring from cells having the same value. Although this approach is commonly adopted for practical convenience, it underutilizes the information the data possesses. Another limitation is that CYBERTRACK uses a stochastic expectation-maximization (EM) algorithm, so it is not suitable for detecting rare cell populations (Naim and Gildea, 2012). Like the EM algorithm, the stochastic EM algorithm often misses very rare populations when they exist near large populations. In these cases, it tends to lump rare populations and larger populations, possibly leading to a misunderstanding of the data. Because studies using mass cytometry often aim to discover the dynamics of rare cell populations that consists of <1% of the total population, tools for correctly identifying such populations are extremely important. In order to address these problems; here, we present an updated version of CYBERTRACK, CYBERTRACK2.0, for the automatic clustering and tracking of proportionally mixed cell populations in longitudinal mass cytometry data.

2 Materials and methods

The improvements of CYBERTRACK2.0 are summarized as follows: (i) a new probabilistic model for generating mass cytometry data based on a zero-inflated multivariate Gaussian mixture distribution that can handle the high amount of zeros in mass cytometry data. (ii) A new algorithm for detecting rare cell populations that uses the stochastic EM algorithm combined with weighted iterative sampling (Naim ). Figure 1 shows a conceptual view of CYBERTRACK2.0 analysis flow. We provided an efficient and straightforward algorithm for estimating parameters of the proposed model. A detailed explanation of our model and estimation procedure is described in Supplementary Material.
Fig. 1.

Conceptual view of CYBERTRACK2.0. Our method takes longitudinal mass cytometry data as an input. Inference process of CYBERTRACK2.0 is based on stochastic EM algorithm for zero-inflated GMM, which consists of (i) replacing zeros by Gibbs sampling from underlying distributions and (ii) estimation of cluster parameters. As an output, CYBERTRACK2.0 provides information on cell clustering, cell population tracking, and change-points in overall mixture proportion. It can impute missing values in mass cytometry data by Gibbs sampling from estimated probability distributions. Also, it implements modified weighted iterative sampling algorithm to find very rare cell populations

Conceptual view of CYBERTRACK2.0. Our method takes longitudinal mass cytometry data as an input. Inference process of CYBERTRACK2.0 is based on stochastic EM algorithm for zero-inflated GMM, which consists of (i) replacing zeros by Gibbs sampling from underlying distributions and (ii) estimation of cluster parameters. As an output, CYBERTRACK2.0 provides information on cell clustering, cell population tracking, and change-points in overall mixture proportion. It can impute missing values in mass cytometry data by Gibbs sampling from estimated probability distributions. Also, it implements modified weighted iterative sampling algorithm to find very rare cell populations

3 Results and discussion

Using simulation and real experimental mass cytometry data, we validated the cell clustering, cell population tracking and change point detection performance of CYBERTRACK2.0. First, we conducted a simulation study by generating a synthetic longitudinal mass cytometry dataset, which includes rare cell populations with larger populations (from 1% to 30%) (Supplementary Figs S3 and S4). Adding to it, we show that imputation of missing values (zeros) by Gibbs sampling provides approximate mean expression levels below detection limit of mass cytometry. Using this synthetic data, we compared the performance of CYBERTRACK2.0 with the original version of CYBERTRACK and a Gaussian mixture model (GMM). As a result, we confirmed that CYBERTRACK2.0 performs better in clustering cells when compared to the other methods (Supplementary Fig. S4). In addition, using pseudo-longitudinal data generated from ground truth mass cytometry data, we compared clustering performance of our model with FlowSOM and PhenoGraph (Levine ; Van Gassen ). We show that clustering performance of CYBERTRACK2.0 is better or comparable to these state-of-the-art methods (Supplementary Figs S7 and S8). These simulation studies validated that CYBERTRACK2.0 has high clustering performance. Furthermore, the ability of CYBERTRACK2.0 is not restricted to clustering; our method produces reasonable estimates for the zero-inflated multivariate Gaussian mixture distribution, and accurately tracks cell population dynamics, and can detect change-points (Supplementary Fig. S6). Also, zero replacement by Gibbs sampling provides imputed data for other downstream analysis. For detailed information on the simulation study, see Supplementary Material. Next, we validated the performance of CYBERTRACK2.0 using two real longitudinal mass cytometry datasets on cancer immunology and hematopoietic development (Krieg ; Palii ). Overall, the cell populations detected using our method were in agreement with the well-known cell lineages. An important result is that it could capture major to very rare cell populations, verifying the effectiveness of using our method in practical situations. In cancer immunology data, CYBERTRACK2.0 illustrated the enrichment of HLA-DR+ myeloids in patients responsive to anti-PD-1 treatment (Supplementary Figs S9–S11). Furthermore, analysis by CYBERTRACK2.0 discovered that the treatment triggers different dynamics among HLA-DR+ myeloid clusters, which may lead to more precise characterization of this potential prognostic marker population (Supplementary Figs S9–S11). For the hematopoietic development data, CYBERTRACK2.0 was able to systematically analyze dynamic emergence of cell lineages from hematopoietic stem and progenitor cells to erythrocytes and megakaryocytes (Supplementary Fig. S14), consistent with the original report (Palii ). For detailed explanation of these results, see Supplementary Material. In summary, we proposed CYBERTRACK2.0, a novel statistical framework for longitudinal mass cytometry data analysis. It is based on topic tracking model and zero-inflated multivariate Gaussian mixture distribution to deal with the previously unsolved problems, such as (i) clustering of cells with longitudinal constraints and (ii) utilization of zeros in mass cytometry data. In addition, weighted iterative sampling was implemented in our method to maximize the chances of detecting rare cell populations of interest. Furthermore, users can use data imputed by CYBERTRACK2.0 for other downstream analysis such as pseudotime estimation or batch effect removal. We believe that CYBERTRACK2.0 is a powerful tool for researchers aiming to obtain biological or clinical insights from longitudinal mass cytometry data.

Funding

This research was supported by JSPS Grant-in-Aid for Scientific Research under grant No. 18H04798, 19H05210, 20H04841, and 20H04281. It was also supported by the Japan Agency for Medical Research and Development (AMED) under grant No. JP19dm0107087h0004, JP19km0405207h9904, and JP19ek0109281h0003. The super-computing resources were provided by Human Genome Center, the University of Tokyo. Conflict of Interest: none declared. Data availability Our codes are available at https://github.com/kodaim1115/CYBERTRACK2. Pseudo-longitudinal data was generated from data provided at https://flowrepository.org/id/FR-FCM-ZZPH. Mass cytometry data on cancer immunity is available at https://flowrepository.org/experiments/1124. Mass cytometry data on hematopoiesis is available at https://flowrepository.org/id/FR-FCM-ZYPT. Click here for additional data file.
  9 in total

1.  FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data.

Authors:  Sofie Van Gassen; Britt Callebaut; Mary J Van Helden; Bart N Lambrecht; Piet Demeester; Tom Dhaene; Yvan Saeys
Journal:  Cytometry A       Date:  2015-01-08       Impact factor: 4.355

2.  Single-Cell Proteomics Reveal that Quantitative Changes in Co-expressed Lineage-Specific Transcription Factors Determine Cell Fate.

Authors:  Carmen G Palii; Qian Cheng; Mark A Gillespie; Paul Shannon; Michalina Mazurczyk; Giorgio Napolitani; Nathan D Price; Jeffrey A Ranish; Edward Morrissey; Douglas R Higgs; Marjorie Brand
Journal:  Cell Stem Cell       Date:  2019-03-14       Impact factor: 24.633

Review 3.  Systems immune monitoring in cancer therapy.

Authors:  Allison R Greenplate; Douglas B Johnson; P Brent Ferrell; Jonathan M Irish
Journal:  Eur J Cancer       Date:  2016-05-04       Impact factor: 9.162

Review 4.  Mass Cytometry: Single Cells, Many Features.

Authors:  Matthew H Spitzer; Garry P Nolan
Journal:  Cell       Date:  2016-05-05       Impact factor: 41.582

5.  Analysis of Immune Signatures in Longitudinal Tumor Samples Yields Insight into Biomarkers of Response and Mechanisms of Resistance to Immune Checkpoint Blockade.

Authors:  Pei-Ling Chen; Whijae Roh; Alexandre Reuben; Zachary A Cooper; Christine N Spencer; Peter A Prieto; John P Miller; Roland L Bassett; Vancheswaran Gopalakrishnan; Khalida Wani; Mariana Petaccia De Macedo; Jacob L Austin-Breneman; Hong Jiang; Qing Chang; Sangeetha M Reddy; Wei-Shen Chen; Michael T Tetzlaff; Russell J Broaddus; Michael A Davies; Jeffrey E Gershenwald; Lauren Haydu; Alexander J Lazar; Sapna P Patel; Patrick Hwu; Wen-Jen Hwu; Adi Diab; Isabella C Glitza; Scott E Woodman; Luis M Vence; Ignacio I Wistuba; Rodabe N Amaria; Lawrence N Kwong; Victor Prieto; R Eric Davis; Wencai Ma; Willem W Overwijk; Arlene H Sharpe; Jianhua Hu; P Andrew Futreal; Jorge Blando; Padmanee Sharma; James P Allison; Lynda Chin; Jennifer A Wargo
Journal:  Cancer Discov       Date:  2016-06-14       Impact factor: 39.397

6.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis.

Authors:  Jacob H Levine; Erin F Simonds; Sean C Bendall; Kara L Davis; El-ad D Amir; Michelle D Tadmor; Oren Litvin; Harris G Fienberg; Astraea Jager; Eli R Zunder; Rachel Finck; Amanda L Gedman; Ina Radtke; James R Downing; Dana Pe'er; Garry P Nolan
Journal:  Cell       Date:  2015-06-18       Impact factor: 41.582

7.  High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy.

Authors:  Carsten Krieg; Malgorzata Nowicka; Silvia Guglietta; Sabrina Schindler; Felix J Hartmann; Lukas M Weber; Reinhard Dummer; Mark D Robinson; Mitchell P Levesque; Burkhard Becher
Journal:  Nat Med       Date:  2018-01-08       Impact factor: 87.241

8.  SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 1: algorithm design.

Authors:  Iftekhar Naim; Suprakash Datta; Jonathan Rebhahn; James S Cavenaugh; Tim R Mosmann; Gaurav Sharma
Journal:  Cytometry A       Date:  2014-02-14       Impact factor: 4.355

9.  Model-based cell clustering and population tracking for time-series flow cytometry data.

Authors:  Kodai Minoura; Ko Abe; Yuka Maeda; Hiroyoshi Nishikawa; Teppei Shimamura
Journal:  BMC Bioinformatics       Date:  2019-12-27       Impact factor: 3.169

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.