Literature DB >> 32322369

Key steps and methods in the experimental design and data analysis of highly multi-parametric flow and mass cytometry.

Paulina Rybakowska¹, Marta E Alarcón-Riquelme^1,2, Concepción Marañón¹.

Abstract

High-dimensional, single-cell cell technologies revolutionized the way to study biological systems, and polychromatic flow cytometry (FC) and mass cytometry (MC) are two of the drivers of this revolution. As up to 30-50 dimensions respectively can be measured per single-cell, they allow deep phenotyping combined with cellular functions studies, like cytokine production or protein phosphorylation. In parallel, the bioinformatics field develops algorithms that are able to process incoming data and extract the most useful and meaningful biological information. However, the success of automated analysis tools depends on the generation of high-quality data. In this review we present the most recent FC and MC computational approaches that are used to prepare, process and interpret high-content cytometry data. We also underscore proper experimental design as a key step for obtaining good quality data.

Entities: Chemical Disease Gene Species

Keywords: Bioinformatics; Computational tools; Flow cytometry; Mass cytometry; Single-cell proteomics

Year: 2020 PMID： 32322369 PMCID： PMC7163213 DOI： 10.1016/j.csbj.2020.03.024

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

High-throughput single-cell technologies are becoming common approaches in daily research. The impressive progression in the number of different molecules that can be measured in a single cell changed the way experiments are done and analyzed. Flow and mass cytometry (FC and MC respectively) are great examples of these changes. Starting from the first flow experiments that measured 2–4 markers which were manually gated, the multiplexing capabilities are currently increasing to 30 [1] and 45 [2] parameters in FC and MC respectively, and strong bioinformatics skills are needed to extract meaningful information. The general concepts of both technologies are similar; antibodies or probes labeled with fluorochromes (FC) or high atomic mass elements (MC) are used to target desired antigens or biomolecules to characterize certain cell properties like cell phenotype, cell cycle [3] or response to stimulation agents via cytokine production, protein phosphorylation [4] or RNA expression [5], among others. Following the staining, cells are introduced in single-cell suspension via capillary tubes into the flow cytometer for FC or alternatively into a Cytometry by Time-Of-Flight (CyTOF, Helios) device for MC. The biological information with single cell resolution is obtained via photons or time-of-flight ion’s mass-to-charge ratios for FC and MC respectively, converted into digital values and stored using the same file format called flow cytometry standard (.FCS). Although both technologies are commonly used to measure cell properties, the definition of event is different. In FC every event that emits light and reach the user-defined threshold will be stored in the FCS file. Both light scatters: FSC (forward-scatter), correlated with cell size, and SSC (side-scattered), correlated with cell granularity, together with fluorescence are used to differentiate single cells from noise [6]. In MC the ion cloud that lasts for more than 10 and less than 150 pushes (spectrum scans) and exceeds the lower convolution threshold is recorded in FCS file as an event. MC lacks the power of light scatter, thus cell events are defined using the metals associated with them in form of antibodies or probes [7]. Nucleic acid intercalators like Iridium (Ir) or rhodium (Ro) are used to define nucleated cells and for non-nucleated cells antigen-specific markers must be used. In FC the light can excite some cell components like flavins, folic acid, retinol, which emit the so called autofluorescence, especially in the green spectrum [8]. This does not use to be a problem in MC, since the high atomic mass metals detected are not frequently found within the cells. However, tissue metal contaminations due to environmental exposures, medical procedures or experimental protocols were reported [9], [10], [17] and should also be considered when deciding the most suitable technology, FC or MC. Both techniques benefit from the development of new probes that increase the number of measured parameters. The high dimensionality of the data changed the way to visualize the results; from manually building two-dimensional gating hierarchies to applying automated clustering or dimensionality reduction methods. The automation process requires properly preprocessed high-quality input data free of artifacts. Artifacts may be introduced during sample collection, processing and acquisition, and should be detected and removed. Although these alterations come from different sources in FC and MC, they have similar impact on data quality. In this review we present the most common artifacts and their sources during data preparation and acquisition and show how to manage them (Table 1). Additionally, we summarize the algorithms and workflows that can improve data quality prior to feature extraction, as an update of previously published reviews [11], [12], [13]. Furthermore, we introduce the state-of-the-art clustering and visualization tools that can be applied to the data and point out their strengths and limitations. Finally, we present the main approaches to analyze the extracted features in the context of biomarker discovery and trajectory interference studies. An overview of the computational methods discussed is presented in Table 2. In Fig. 1 we show a typical workflow for the preparation and analysis of multi-parametric FC and MC data.

Table 1

Artifacts and their prevention in high-dimensional flow and mass cytometry data preparation and analysis.

Source	Effect/Artifact	Prevention
Experimental	Change in reagent’s batchese.g. a) Change in fixation reagent;b) Change in antibody lot	General change in protocol performance that can introduce batch effect. Not predicted contaminations (important in MC study);a) Affect the staining and cell recovery;b) Different fluorochrome or metal conjugation efficiency results in different antibody background and staining intensity	Order enough quantity needed for the whole experiment if product stability allows it;Re-test new lot and confirm similar/good performance
	Change in the protocol e.g. a) use of fixed vs fresh blood;b) Change in centrifugation steps;c) Change in staining temperature or time	a) Change in sample stability;Change in antibody performance;Change in sample background b) Different cell recovery;c) Different background of antibody due to inefficient or different washing step;Instability of fluorochrome-conjugated tandems at RT in light	Decide staining approach before sample collection;Optimize the protocol, prepare SOP;Use exactly the same protocol as was used for antibody, barcoding optimization, cocktail preparation, including cell preparation, antibody staining (RT vs 4°C, dark vs light)
	Change in cocktail preparation: e.g. lack of one antibody, wrong fluorochromes, different clone	Different staining intensity;Different staining pattern;Problems with population definition if one of the markers is missing	Prepare one big cocktail of antibody, aliquot and store frozen (in MC), lyophilized or desiccated (MC and FC)
	Pipetting errors	Variation in staining intensity, especially problematic when MI will be compared; variation in the number of collected events	Barcode the samples and process them in the batchesCreate as few batches as possibleInclude reference sample to track variation in each batch
	Improper antibody titration	Too little: problems with population definition;no split between positive and negative valuesToo much: Unspecific binding of the antibodies;Values out of or at the edge of dynamic range	Perform titration experimentsUse tools like AOF or calculate staining index to ensure correct titration
	Unspecific staining	Unknown co-expression of markers,High, “weird” signal in most of the channels	Block FC receptors,Block hydrophobic biding using heparinUse life/dead staining to exclude dead cells that have high antibody binding properties
	Incorrect panel design	High spreading error that can mask dim populationsInability to define needed population due to the lack of proper cell definitionSignificant signal spillover especially in MC	Use tools like Guided Panel Solution or Maxpar panel designer to design your panel;Use databases like [20] to carefully select appropriate markers for the needed cell populations;Use published panels
	Metal contamination (Only in the case of MC)	Unspecific signal in .FCS file registered as events;Crosstalk of the contaminant with different channels;Stickiness of the contaminant to the capilar causing clogging;Shorter detector lifespan;Cones contamination and loss of CyTOF sensitivity	Troubleshooting will depends on the source of contamination:Reagent contamination:check all reagents by running them in the solution mode and at proper concentration as stated in [134]; avoid using autoclaved glass as it can contain barium contaminants; use always filter tips; use previously tested references in MC studiesSample contamination:Due to medical procedure [10], [135], environmental exposure [9] or experimental protocol [17]Be aware of possible contamination; decide if MC can be used; if contamination is possible screen small aliquots of samples as shown in [139];If contamination was discovered during acquisition dilution of the sample can be considered

Acquisition	a) Improper cytometer calibration;b) Device decalibration upon acquisition	a) Loss in antigen resolution;Intensity changes across the runs;Run-to-run variation;b) Decrease in the signal intensity	Check your machine performance and get to know its resolution;Calibrate device;Control for time changes in the machine calibration;Use calibration beads to correct for signal drop and changes;Include reference samples in every batch
	Clogging of the device	Signal instability affecting median expression of the markers	Clean device when necessary to remove the clog
	High speed of acquisition or changing in acquisition speed	Change in doublets to singlets ratio;Increase in coefficients of variation (CVs) together with the sample speed	Keep the speed constant, and adequate to obtain good singlets/doublets ratio
	Different sample or panel labeling	Errors when analyzing the files or inability to read-in the files	Define labeling strategy, create a template and keep it constant across all the project

Analysis	Not enough statistical power	Lack of significance	Calculate statistical powerConsult statisticianInclude more samples
	Improper transformation	Inability to distinguish positive and negative events;Improper clustering	Verify transformation method by visualizing markers e.g. using flowJoOptimize transformation
	Batch effect	Improper data interpretation;Incorrect conclusions	Visualize batch effect e.g. using dimensionality reduction tools like PCA or t-SNE;Properly design the experiment e.g. include reference sampleCorrect for batch effect when building the model
	Improper normalization	Changes in marker expression distribution;Improper assignment to the clusters	Carefully verify the performance of normalization tools by visualizing the markers
	Uncleaned changes in signal intensities	Improper assignment to the clusters	Spot problematic files by applying tools like AOF, normalize or discard them from the analysis
	Uncleaned bad quality events	Improper assignment to the cluster	Verify the signal stability and clean if necessary using tools like flowAI, flowClean, flowCut or manual gatingRemove doublets, debris, and dead cells
	Presence of doublets/high doublets to singlets ratio	Co-expression of the markers biologically incorrect;Improper assignment to the clusters	Gate out all the doublets
	Improper Clustering or dimensional reduction performance	Unstable clusters or cell position, different results with every run of cluster	Verify clustering settings by gating few example files and calculating F1 scoreCheck for run-to-run stabilityCheck for sensitivity to subsampling

Table 2

Overview of bioinformatics tools for high-dimensional flow and mass cytometry data analysis.

Application	Source
Panel design	Guided Panel Solution[39]	Panel design in Flow Cytometry	BD Biosciences
Panel design	Maxpar Panel Design [40]	Panel design in Mass Cytometry	Fluidigm

Quality control/preprocessing	Average overlap frequency (AOF) [45]	Antibody performance evaluation	R package/Bioconductor package;Astrolab
	CATALYST [51]	MC data preprocessing (bead-based normalization, debrcoding; compensation: FlowSOM clustering)	R package/Bioconductor package;interactive Shiny-based web application
	flowAI [61]	Signal Cleaning;Flow rate cleaning;Outliers cleaning	R package/Bioconductor package;GUI;Plugin FlowJo Exchange
	flowClean [63]	Signal Cleaning	R package/Bioconductor package;Plugin FlowJo Exchange
	flowCore [65]	Basic structures for flow cytometry data	R package/Bioconductor package;
	flowCut [62]	Signal cleaning	R package (github repository)
	flowTrans [58]	Data transformation	R package/Bioconductor package
	flowVS [59]	Data transformation	R package/Bioconductor package
	flowWorkspace [70]	Representation and interaction with gated and ungated data in R	R package/Bioconductor package
	Single-cell deconvolution algorithm [64]	Debarcoding	CATALYST; Matlab; Fludigm stand-alone; Updated Single-cell Debarcoder [140]; R package PREMESSA(gihub repository)
	Optimal Sample Assignment Tool (OSAT) [22]	Sample to batch allocation	R package/Bioconductor package

Normalization and batch effect correction	gaussNorm [76]	Normalization	R package/Bioconductor package flowStat
	fdaNorm [76], [77]	Normalization	R package/Bioconductor package flowStat
	CytoNorm [79]	Normalization using reference sample	R package/Bioconductor package; Plugin FlowJo Exchange
	CytofBatchAdjust [80]	Normalization using reference sample	R package/Bioconductor package
	BatchEffectRemoval [78]	Normalization using reference sample	Python

Dimensionality reduction	Diffusion Maps [91], [101], [103]	Non-linear dimensionality reduction/Trajectory inference	R package/Bioconductor package;
	Isomap [89], [90]	Non-linear dimensionality reduction/Trajectory inference	R package/CRAN package vegan
	PCA [83]	Linear dimensionality reduction	R package stats
	t-SNE [84]	Non-linear dimensionality reduction	R package/CRAN package;Plugin FlowJo Exchange; Cytobank; Matlab; Python
	BH-SNE (viSNE) [92], [97]	Non-linear dimensionality reduction	Python; Cytobank; Matlab; R package/CRAN package Rtsne
	UMAP [88]	Non-linear dimensionality reduction	Python, R package/CRAN package uwot; Plugin FlowJo Exchange;
	One-SENSE [105]	Non-linear dimensionality reduction	R package/Bioconductor package
	HSNE [85]	Non-linear dimensionality reduction	Cytosplore
	FIt-SNE [86]	Non-linear dimensionality reduction	R package; Matlab; Python; Plugin FlowJo Exchange
	EmbedSOM [112]	Non-linear dimensionality reduction	R package/CRAN package ; Plugin FlowJo Exchange
	opt-tSNE [87]	Non-linear dimensionality reduction	Python; Cloud opt-SNE
	Jensen-Shannon (JS) divergence [92]	Dimensionality reduction comparison	R package/Bioconductor package cytutils

Data clustering and automated gating	CytoCompare [110]	Clustering comparison	R package (github repository)
	flowClust [136]	Unsupervised Clustering	R package/Bioconductor package;GenePattern Platform [137]
	flowDensity [73]	Supervised clustering	R package/Bioconductor package
	flowLearn [71]	Semi supervised Clustering	R package (github repository)
	FlowSOM [81]	Unsupervised Clustering	R package/Bioconductor package, Cytofkit [128]; Plugin FlowJo Exchange
	flowType [123]	Unsupervised Clustering	R package/Bioconductor package
	PhenoGraph [114]	Unsupervised Clustering	Matlab; Python; R package Rphenograph (github repository); Cytofkit [126]; Plugin FlowJo Exchange
	SPADE [44]	Unsupervised Clustering	R package/Bioconductor package; Cytobank, Matlab
	X-shift [115]	Unsupervised Clustering	Standalone application (VorteX); Plugin FlowJo Exchange
	DensVM [90]	Unsupervised Clustering	R package/Bioconductor package cytofkit
	ACCSENSE [117]	Unsupervised Clustering	Standalone ACCENSE application

Useful pipelines and approaches	CellCNN [126]	Representation learning approach to detect rare cell subsets associated with disease	Python
	Citrus [125]	Unsupervised clustering with regularized regression model	R package with GUI; Cytobank
	cydar [119]	Unsupervised assignment to hyperspheres, control of the spatial false discovery rate, changes in abundance visualization	R package/Bioconductor package,GUI with Shiny application
	Cytofast [127]	Visual and quantitative analysis of cytometry data to discover immune signatures and correlations	R package/Bioconductor package
	Cytosplore [129]	Interactive visual analysis system contain t-SNE, HSNE, SPADE	Interactive tool
	Cytofkit [128]	Preprocessing; cell subset detection (DensVM, FlowSOM or Phenograph, ClusterX); data visualization (PCA, t-SNE, Isomap)	R package/Bioconductor package,GUI with Shiny application
	diffcyt [56]	Unsupervised clustering with FlowSOM, empirical Bayes moderated tests for statistical analysis	R package/Bioconductor package
	flowType/RchyOptimyx [122]	Unsupervised clusteringConstruction of cell hierarchy for maximization of an external variable	R package/Bioconductor package
	FloREMI [121]	Preprocessing; feature extraction;feature selection; survival time prediction	R scripts (github repository)
	CyTOF workflow [54]	Unsupervised clustering with FlowSOM, generalized linear mixed models or linear mixed models	R package/Bioconductor package
	DAMACY [96]	Multivariate method based on PCA and multivariate regression based on Partial Least Squares (PLS)	Matlab
	OpenCyto [138]	Facilitate the automated gating methods	Bioconductor package/GUI with shinyCyto application

Trajectory detection	pCreode [133]	Trajectory inference with multiple branching	Python
	Wanderlust [131]	Trajectory inference without branching	Matlab based interactive tool cyt
	Wishbone [132]	Trajectory inference with two branches	Python/Matlab based interactive tool cyt

Fig. 1

The flow and mass cytometry experimental and data analysis computational workflow. For more information about which tools to use and how to design well each step to avoid artifacts refer to Table 1, Table 2.

Artifacts and their prevention in high-dimensional flow and mass cytometry data preparation and analysis. Overview of bioinformatics tools for high-dimensional flow and mass cytometry data analysis. The flow and mass cytometry experimental and data analysis computational workflow. For more information about which tools to use and how to design well each step to avoid artifacts refer to Table 1, Table 2.

Obtaining reproducible and high-quality data

To obtain statistical power for both experimental studies and evaluation cohorts sample size estimation is a key step in the design of a cytometry project. This calculation prevents changes in reagent batches, including antibody cocktail, and should be planned upfront, avoiding the introduction of additional variability. A Standard Operating Procedure (SOP) for sample collection and processing is highly recommended, as it significantly improves data reproducibility [14], [15], [16]. For MC the selection of reagents and their storage is critical to avoid metal contamination events (see Table 1 for possible contamination sources) [17]. It is essential to consider if cells should be stained immediately upon collection or preserved until recruitment is completed. If all the samples are obtained at once, they can be stained and acquired immediately. However, in longitudinal studies, or if the cytometry unit is far from the recruitment center, the sample preservation before [19] or after staining [18] should be considered. The goal is to process, stain and acquire as many samples as possible with the same protocol, antibody cocktail, and instrument settings. Each preservation protocol will affect the sample composition and antigen expression [18], [20], [21]; hence benefits and drawbacks will depend on the biological question and should be carefully considered before performing the experiments. Often, hundreds of samples are included in cytometry studies and are split into multiple experimental groups. This can introduce “batch effects” defined as non-biological differences between them . To minimize this effect, a careful experimental design should ensure the even distribution of biological groups and confounding factors across batches [22]. Packages like OSAT (Optimal Sample Assignment Tool) [22] can be used to optimally distribute the samples into batches. The antibody labeling and sample staining should be consistent across all the groups, as discrepancies can introduce technical differences in mean intensity (MI) values that can be hard to distinguish from biologically meaningful information. This is why strict control of intra- and inter-group variations should be introduced in the experimental design. To limit intra-batch variation, barcoding (labeling of individual cell samples with unique combinatorial barcodes) and sample pooling before antibody staining is used particularly in MC [23], [24], [25], [64], and less often in FC [26], [27]. To minimize inter-batch variation, an experiment-required stability master-mix of the staining cocktail is recommended to be used along the project. Both lyophilized and desiccated antibody cocktails were reported [20], [28], [29] and freezing of the MC cocktail aliquots was also shown to be successful [30]. Unfortunately, even well prepared SOP minimize, but do not resolve the problems with day-to-day reproducibility. Thus, measures allowing estimation and correction of batch effects are needed. The practice of including a reference sample in each barcoded batch is becoming a standard in MC [31] and was reported in FC experiments as well [32]. The reference sample is an aliquot of a bigger volume obtained from one donor at a particular time, aliquoted, and preserved. It carries the information of the technical variability introduced during sample preparation, staining and acquisition, and therefore allows to measure run-to-run variation [31]. In FC and MC the panel optimization is the most critical and difficult step. Both technologies require proper assignment of dim and bright markers depending on the channel sensitivity and its performance in the context of staining index and spillover [1], [33], [34]. The success of automated methods to resolve cell populations depends more on well-selected markers than on the frequency of the cells, thus the probes should be selected carefully [35]. To identify the markers of interest, a recently published antibody staining database could be useful, as it contains staining patterns for 350 antibodies used in fresh and fixed peripheral blood mononuclear cells (PBMC) [20]. Additionally, antibody titration, done at the same conditions as the final experiment, is essential to ensure proper signal intensity allowing population definition. It should be stressed that if a population cannot be defined by manual inspection due to a sub-optimal amount of added antibodies, it will not be detected by most clustering algorithms [35]. In both techniques, signal spill from one channel to another is observed. In FC it is caused by the overlapping emission spectra of different fluorochromes. In MC it can be due to metal impurities from the metal tags; metal oxidation affecting mainly light lanthanides and causing signal spillover to the heavier spectrum of masses; or metal over-abundance when high antibody concentration is used inappropriately and the signal of this particular mass cannot be resolved [36]. In FC the signal crosstalk can be severe and cannot be avoided in multicolor experiments. In MC, maximum spillover does no exceed a few percent and proper panel design can minimize these issues. Inadequate panel design or lack of proper compensation controls, especially in FC, can create false positive events [37]. Additionally, it can introduce spreading error, an artifact produced by the error in photon counting [6], which can mask low or dim fluorescence positive cells. As a higher number of markers requires more sophisticated panel designing skills, tools like Guided Panel Solution offered by BD [39], or Maxpar Panel Designer by Fluidigm [40] can be helpful, but not sufficient, especially in FC where spreading error information is not provided. Spreading errors depend on laser configuration, dye brightness and quality of PMT (photomultiplier tubes). Thus, careful selection of probes and deep understanding of cytometer configuration and its performance are critical in FC [41], [42]. For MC it is also important to be familiar with the instrument performance, as variation in the sensitivity and resolution was observed between different CyTOF devices [43]. During the preparation of the SOP a pilot study including a few samples is strongly recommended, as it can help to fix the protocol limitations [28], [43]. Evaluation of antibody staining, titration, and signal spillover is an important but time-consuming process, especially in high dimensional approaches. Fortunately, a recent study shows that clustering algorithms like SPADE (Spanning Tree Progression with Density Normalized Tree) [44] can be used to evaluate the titration of a panel and track the spillover artifacts. Additionally, metrics like Average Overlap Frequency (AOF) can be applied to verify antibody performance by calculating staining distances between the positive and negative populations, reducing substantially time for calculation and plotting of staining indices [45]. This shows that even at the moment of panel optimization, computational approaches can significantly accelerate bench work and improve data quality. For more details about panel preparation and standardization, readers are directed to the following literature [46], [47], [48], [49]. The capillary introduction system in both FC and MC suffers from cell clogging, altering the flow rate and signal quality over time of acquisition. Sample clogs can be caused by specific biological materials starting from “easily” acquired cell lines or PBMC to whole blood or the most prone to clogging, the disaggregated tissue. In both technologies the disturbances in the acquisition rate affects signal quality. The higher the speed the more coincidence events known as doublets are collected, and the more spread of the signal is seen [6]. The maximum recommended acquisition speed for FC is 25 000 cells/s, while for MC is up to 1000 cell/s [50]. It should be noted that the maximum speed depends on the type or cells that are acquired and on the experimental target. If rare cells that constitute 0.01% frequency are of interest flow rate should be lower and well optimized [6]. For more information about frequent errors and solutions in the experimental part of the workflow, readers are directed to Table 1.

Prior to feature analysis: data preprocessing and quality controls

Data compensation and transformation

As stated before, both FC and MC suffer from signal crosstalk across detection channels. To obtain correct data, a compensation matrix needs to be calculated using appropriate controls [1], [51]. While proper MC panel design can minimize spillover issues [37], it is almost inevitable in standard polychromatic FC above 15 colors. However, as pointed out by Leipold [52], minimal spillover is not equal to zero spillover, so MC data might also require correction. As mentioned before, the reason for signal crosstalk is different for FC and MC, however in both technologies the spills can be defined as a linear function of signal intensity, and can therefore be corrected using spillover coefficients for each channel [51], [53]. Although this method is working for standard FC, in MC this correction introduces negative values, which are normally almost absent in MC data. As an alternative, non-negative least-squares (NNLS) approach used in spectral cytometry, was applied to MC data [51]. If proper, single stained controls and unstained samples are provided, compensation can be automatically calculated using platforms like Diva or FlowJo for FC and CATALYST package for MC [51]. FC and MC raw data are often characterized by skewed distribution with varying ranges of expression. In consequence it can be difficult to distinguish positive and negative populations [54]. As visualization and clustering performance depends on the scale and distribution, it is important to bring the expression peaks as close to a normal distribution as possible [55]. To do so, the expression values are usually transformed using an inverse hyperbolic sine (arcsinh) transformation with the cofactor 5 or 150 for MC and FC, respectively [56]. The arcsinh conversion behaves similarly to a log transformation at high values, but is approximately linear near zero, and a cofactor controls the width of the linear region. FC data contain more negative values due to the correction of background noise, autofluorescence, and compensation; conversely, MC data contains zero values when no ions are detected and few negative values are introduced due to background subtraction and randomization [56], [57]. The type of transformation can be sample and marker-specific, especially in FC data, as shown in [58], [59], and the choice of parameters can be automatically optimized by tools like flowTrans and flowVS. It should be noted that some of the visualization and clustering tools require transformation to be done upfront, while others perform it as a default. It is important to always check the transformation requirements, as this might affect the downstream analysis.

Signal quality check and cleaning

As mentioned in Section 2, the capillary tubes used for sample introduction in FC and MC can clog resulting in sudden changes in the signal. Other issues such as unstable data acquisition can cause signal shifts and change the mean intensity [60]. These signal disturbances affecting downstream analysis should be identified and removed from the data. Currently, three algorithms can be used to do this: flowAI, [61] that uses change point analysis and allows automatic or interactive analysis; flowCut, [62] that creates summed density measures using mean, median, percentiles, variation, skewness, and removes events based on density curve analysis; and flowClean, [63] that tracks the changes in the frequency of artificially created populations, taking advantage of compositional and change point analysis, flagging outliers with unusual ratio of cell populations. The first two methods are fully automated while flowClean represents a semi-automated approach. In all methods, the signal check is performed for every channel across the time of acquisition. The data are divided into equally sized bins of cell events. For each bin, the models corresponding to each method are calculated and every bin that differs from the rest is flagged in flowClean, or alternatively flagged and removed in flowAI and flowCut. Additionally flowAI can remove outliers from the flow rate and dynamic range [61]. Due to their different implementations, the level of stringency differs across methods. Thus the optimal performance will depend on the data and on the parameter settings [60]. It should be noted that all of the methods mentioned above were designed for FC studies and to our knowledge were not applied to MC data. Due to differences in the FC and MC data, as different time resolution (events in FC are acquired faster and at higher concentrations than in MC), negative values in FC versus “0” values in MC, parameter settings can be different, but up to now no data exists to support this statement. This is an unexplored niche open for further studies.

Data debarcoding and dead cells/debris gating

In order to obtain de-barcoded data, deconvolution of the raw events needs to be performed. The most common way to debarcode MC data is to use a single-cell deconvolution algorithm [64]. For debarcoding user-friendly programs and R-based functions that can be used are listed in Table 2. For FC data automated deconvolution methods include rectangleGate from the flowCore package [65] and flowClust clustering methods, when the number of wanted clusters is equal to the number of barcoded samples [32]. Doublets, debris, and dead cells introduce noise into the data and should be removed prior to data analysis as these affect clustering results. As mentioned before, the definition of event is quite different for FC and MC and hence the gating strategy will differ. In FC, usually FSC parameters height (H) and area (A) are plotted against each other and used to eliminate doublets. The events that are out of the diagonal are defined as doublets, as they are characterized by the same height but different area of the signal curve [6]. However, debris and dead cells can overlap with the cell populations of interest, and scatter parameters can change depending on the sample processing protocol [66]. Therefore, it is recommended to stain live cells and populations of interest with specific fluorescent probes. For MC, as data are usually acquired with calibration beads [67], they need to be identified using bead specific channels and removed manually, or automatically using e.g. the CATALYST package [51]. The nucleated, intact cells are defined by balanced intensity for Ir, which distinguishes them from Irlow debris and Irhi doublets. If red blood cells, or other non-nucleated particles, need to be defined, the use of specific probes is required. Doublets are a real challenge in MC as FCS and SSC parameters cannot be used. Instead, users define them based on balanced Ir staining and event length [11] or Gaussian parameters, such as residual, offset, center, and width [68], [69]. It is worth noting that barcoding staining with 3 different isotopes per sample helps to identify and remove doublets [64], thus increasing sample quality. Among other platforms FlowJo and Cytobank can be used for manual gating, or alternatively data can be imported in an R environment using e.g. flowWorkspace package [70]. If gating is provided for some of the files, semi-supervised gating methods like flowLearn could be used to reproduce the gating strategy for the remaining data [71]. This algorithm employs the gating thresholds provided as input and transfers them to the rest of the samples using derivative-based density alignments. Packages like flowStats [72], flowDensity [73] or OpenCyto [138] (a framework for constructing automated gating hierarchy) can be useful to build user-defined gating strategies. Although manual inspection is always advised, the automated approach should be considered for projects generating a high number of files.

Staining irregularities, data normalization and removal of batch effects

Inspection of marker expression levels across all files and batches is an important step of sample quality control. Staining irregularities, such as a loss of separation between positive and negative values for a given marker, or significant changes in the signal intensity, must be identified and removed, as they can affect event classification into specific clusters [45]. Recently the AOF algorithm, that uses cell frequencies to calculate the average of overlapping cells per channel, was applied to more than 2000 files in MC [20]. Based on calculated sample scores and user-defined thresholds, AOF identified problematic marker expression and affected files were discarded prior to analysis. This algorithm might be a good expansion of the quality control pipeline, however, it should be used with caution, since the signal changes could be due to biological or technical variation. Barcoding and reference samples can help to distinguish between these two possibilities, and the introduction of normalization and batch effect correction can help in saving files instead of discarding them. The technical variability can come from day-to-day differences in experimental and instrumental performances. Instrument variation that cannot be controlled by the users (e.g. differences in daily instrument calibration), are identified and removed by normalization. The variations in the experimental procedure (e.g. slight differences in staining) are identified and removed via a batch effect correction [74]. Both will be discussed below. The acquisition time in FC and MC differs, from few minutes in FC, up to a whole day in MC for barcoded samples, and therefore requires different approaches for normalizing the data. In FC, the use of single-stained capture beads and rainbow beads, just before sample acquisition was reported [28], [75] to optimize PMT voltages, resulting in similar MIs for the markers. As FC experiments are shorter in acquisition time, it is assumed that the MIs will be equivalent for the samples acquired within the same day. On the other hand, in MC a signal drop caused by progressing CyTOF decalibration is frequently observed, especially when long, barcoded samples are run. In order to correct for it, bead-based normalization was introduced in [67] and modified by Fluidigm. The algorithm uses commercially available calibration beads, spiked and acquired together with the sample. Hence, changes in the signal can be tracked through the acquisition time. Next, the beads are identified and the median intensities of the beads are calculated in defined time intervals across all files. Based on the obtained values, the global mean for each bead is calculated and used as a target value. To obtain the transformation factor, a linear model using the global means and interval-specific intensities is calculated. This factor is then applied to all cell events and interpolated to all markers in the corresponding intervals and files. Although run-to-run machine variation can be optimized for both MC and FC, the technical differences introduced upon sample preparation will remain. Therefore the normalization and batch effect correction play important roles in downstream analysis. fdaNorm and gaussNorm algorithms were developed to correct the files across the experiments [76]. They both perform density-based normalization per single channel using ungated .FCS files. The algorithms assumes that each marker has its characteristic number of density peaks called landmarks, which are shared by all samples and can be identified even with some changes in MI. During normalization these density peaks are shifted to align the samples. Although algorithms differ in their implementation, they perform similarly in the context of resolution in binary markers like CD3, CD4 or CD8. When using gaussNorm, the number of density peaks needs to be known upfront for each marker, while fdaNorm estimates peaks automatically. The remarks and a extended version of the fdaNorm algorithm can be found in [77]. In this version the reference file provides information about marker distributions together with gating template, and additionally normalization is performed during the gating. The reason for these changes is that the marker densities can differ across distinct populations, affecting the normalization process, and the use of a reference sample with gating upon normalization improves the automation process. These methods perform well for automated gating, as the density peaks alignment facilitates implementation of reproducible gating hierarchy, however it requires previous knowledge of the analyzed cells. This can be useful in clinical studies when known populations are quantified in a relatively short time or for the extraction of cell frequencies identified using binary markers. However, as the intensity of the peaks are shifted, comparison of the MI cannot be performed, and part of the biological information is lost. As mentioned before the inclusion of reference samples becomes a useful tool to track batch effects introduced during sample preparation. Recently three methods that take advantage of it became available to researchers, and will be discussed. Shaham et al. [78] introduced a deep learning approach called BatchEffectRemoval. This approach is based on Maximum Mean Discrepancy (MMD) and Residual Nets, and corrects the distribution of one sample to its corresponding pair, collected at a different time point. Although it can be a good solution when time point experiments are performed, its performance in MI-sensitive markers is still questionable. CytoNorm [79] and CytofBatchAdjust [80] are two alternatives that use reference samples aliquoted across the batches to obtain batch-specific transformation factors. CytoNorm starts with FlowSOM [81] clustering for each reference file. At the cluster level, quantiles for each marker are computed and the mean quantile distribution is calculated using values from all the reference files. This information is used to learn the appropriate transformations for each batch and to correct for it. One of the CytoNorm assumptions is that the batch effects are small enough to do not impact FlowSOM clustering results. In other words, although samples differ at the cluster level, the metaclustering that defines cell populations should be the same across all reference samples. If not, some artifacts can be introduced to the data [80], and therefore a careful and detailed investigation should be performed before normalizing collected batches. On the other hand, CytofBatchAdjust performs the normalization on ungated files, where batches can be scaled to a user-defined percentile, mean, medium or quantile normalization. Both algorithms give the advantage of preserving the biological information contained in MI. However, it is important to ensure that the reference sample is prepared using the same protocol as for the studied samples. Therefore upfront assumption of sample composition should be taken into consideration.

Data analysis

Data visualization – dimensionality reduction methods

Manual gating not only aims at extracting the important features, but also gives a good insight into data quality, variability, structure or differences between groups of individuals. In high-dimensional data, the same inspection should be performed using dimensionality reduction or clustering-based approaches. The goal of the dimensionality reduction methods is to preserve the structure of high-dimensional data in the lower, easier to interpret, 2 or 3 dimensional map. These methods can be divided into linear and non-linear tools. Linear methods represented by PCA (Principal Component Analysis) [82], [83] focus on keeping the maximum variance of the points in the lower space, thus keeping the dissimilar points far from each other [84]. On the other hand non-linear algorithms like t-SNE (t-Stochastic Neighbor Embedding) [84] and its derivatives [85], [86], [87], [92], [97] keep the similar cells close to each other, therefore focusing on local relationship preservation [84]. Some of the tools like t-SNE and UMAP (Uniform Manifold Approximation and Projection) [88] separate well known populations, giving a nice overview of existing cells. Other methods like Isomap (isometric feature mapping) [89], [90] or Diffusion Maps [91] visualize differentiation trajectories, as they are able to preserve both local and global distances between cells. PCA is designed to preserve the features with the highest variability in the principal components (PC). It assumes that the most prominent variation will be explained by the first two to three PC, making them easily interpretable. As shown by [92], [93], due to the linear assumption, PCA cannot separate well populations in the first two PC, as immune panels are usually designed in the way that each marker brings new and independent information. Nevertheless PCA as an easily scalable and not-stochastic technique, remains a powerful tool and is widely used in biological and clinical cytometry studies, as shown in [94], [95], [96]. t-SNE is a state-of-the-art visualization method that projects high-dimensional information into easily interpretable 2D maps [84]. t-SNE calculates two similarity matrices based on the distance in the high- and low-dimensional space using pairwise comparison across all the points. Next, in a iterative way the algorithm minimizes the difference between two matrices, which results in the optimized position of each cell in the 2D space [55]. t-SNE pairwise comparison has its pros and cons, it is a robust and accurate algorithm, and on the other hand the more cells are analyzed, the more pairs need to be computed and the highest the computational cost. This limits the use of t-SNE in FC/MC studies where thousands or even millions of events are acquired. To overcome this issue random downsampling (generation of a smaller subset of cells), is often used, taking the risk of losing rare populations. Therefore, new implementations were developed, aiming at limiting the computational power required to obtain high-resolution data. Among them BH-SNE (Barnes-Hut-SNE) [97] reduces the number of pair comparisons by constructing a tree-like structure. This implementation is used in viSNE and published by Amir et al. [92]. HSNE (Hierarchical Stochastic Neighbor Embedding) [85] is a combination of A-tSNE (t-SNE approximation) where, instead of computing precise distances, approximated k-nearest neighborhood graph is computed and embedded using BH-SNE. FIt-SNE (Fast Interpolation-based t-SNE) [86] uses Fourier interpolation to speed up the convolution step and opt-SNE [87] allows fine-tuning of t-SNE parameters, like the number of iterations, to obtain high resolution maps in a shorter time. It should be noted that t-SNE is stochastic, which means that every new run will give slightly different visualization. Consequently, researchers should perform multiple runs in order to obtain good data representation. Comparison of multiple maps can be only done if the samples were run simultaneously with the same settings. Jensen-Shannon divergence, a statistical method that measures two probability distributions, can be useful to compare the projection from the same data set as shown in [92], [98]. Recently a new visualization tool called UMAP gained attention in the cytometry field. This tool also preserves global distances between cell types, while t-SNE conserves only close neighborhoods [88], [99]. For this reason UMAP was used to recapitulate human hematopoiesis, and is useful for cell continuity visualization [99]. Additionally both UMAP and FIt-SNE can analyze more cells than t-SNE in a shorter time [99]. Isomap [89] and Diffusion maps [91] also preserve global relatedness and continuity between cells instead of calculating the pairwise Euclidian distance. Isomap uses non-linear geodesic distance [89]. Diffusion map introduced by [91], and adapted to the single cell study by [101], constructs diffusion matrices based on random walk probabilities between cells and generate diffusion components DC (known as eigenvectors), that similarly to the PC correspond to the largest coefficients of the data [102], [103]. Even though some improvements were made on t-SNE implementation and faster algorithms like UMAP were built, the scalability problem remains. Most of the embedding techniques were first used on transcriptomic data where, in contrast to cytometry, a relatively small amount of cells are described by a much larger amount of markers. Although other dimensionality reduction and topology inference algorithms can be used, the lack of good implementations that enable handling of millions of cells prevents researchers to apply them to big files as noted in [104]. Although non-linear dimensionality reduction methods are powerful in projecting phenotypically similar cells, the understanding of the marker contribution to cell segregation can be difficult, as it requires plotting multiple markers in individual maps. In such case, studying marker co-expression is even more challenging as was pointed out in [96], [105]. One-SENSE (One-Dimensional Soli-Expression by Non-Linear Stochastic Embedding) answers to this limitation and propose 2D assignment of the markers to categories that can be then visualized using a combination of t-SNE map and heatmaps [105]. This method was successfully applied to study T cell and dendritic cell heterogeneity [100], [105], [106].

Data visualization – clustering methods

Clustering-based algorithms group similar cells and use visualization tools to represent them in a lower dimensional space [13]. When choosing the best clustering method several requirements should be considered, such as the need for downsampling, reproducibility, rare cell detection, and running time. These variables were measured by Weber et al., where several of the currently used cytometry clustering algorithms were compared, identifying FlowSOM as a good trade-off between quality and time [107]. Since its publication, FlowSOM [81] became a widely used clustering algorithm in the field of cytometry [54], [108], [109]. This algorithm uses a two-step clustering process: a SOM (Self-Organizing Map) and consensus hierarchical clustering. SOM, a type of artificial neural network, contains a grid of nodes where each node represents a point in a multidimensional space. SOM reproduces the data topology by assigning the most similar cells to the same node or its closest neighbors. Increasing the grid size increases the possibility of finding rare populations. However, as shown by Weber et al., the reproducibility of the data can be compromised and additional splitting of the largest populations can be seen. In the second step, node centers are grouped into metaclusters using a consensus hierarchical clustering, and final cluster labels are obtained. The data can be visualized using a minimal spanning tree, like in SPADE [44], or in a heatmap [54]. Although similar results can be obtained with both methods, the two-step clustering in FlowSOM accelerates analysis and evades downsampling, making it a better choice. Unfortunately, the stochasticity problem remains, and unless the seed (starting analysis point) is pre-defined, the comparison between different runs cannot be done. When comparing clustering performance, the F1 score measuring tests’ accuracy using precision and recall could be applied [107]. Alternatively the algorithm CytoCompare which computes the distance between the clusters using marker distribution [110], or the Jaccard coefficient [111] can also be applied. Multiple tools and workflows implementing FlowSOM have been recently published: EmbedSOM improves data visualization [112]; diffcyt, a new computational framework for differential discovery analyses [56] will be discussed below; Ek’Balam, a hierarchy-based clustering in the Astrolab Cytometry Platform [20]. All these applications emphasize the broad utility of FlowSOM. However as noticed in [113], one of the major drawbacks of this algorithm is the user-defined number of clusters, which limits the understanding of population diversity and introduces researcher supervision. Other popular clustering approaches could be used instead, like Phenograph, which uses k-nearest-neighborhoods (KNN) to represent phenotypically similar cells as highly interconnected nodes [114] or X-shift, that also applies KNN with density estimation [115]. Both tools ranked high in benchmark studies, especially for rare population detection [107], [116]. They have the ability to predict the number of clusters in a given sample, although they perform poorly in scalability requiring downsampling. Additionally the fusion of both dimensionality reduction methods using t-SNE and density based clustering was also reported and successfully applied in the immune diversity study of lymphoid compartment using ACCENSE (Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding), [117] and of the myeloid compartment using DensVM (Density-based clustering aided by Support Vector Machine), which combines density based algorithm with machine learning techniques [90].

Looking for the meaning: analysis of cytometry data – biomarker discovery

FC and MC are commonly used as biomarker discovery platforms to improve diagnosis or allow the prediction of response to therapies. Typically the cell abundances and their median marker expressions are extracted using clustering or dimensionality reduction. Then statistical tests are run to associate cell differential abundance (DA) and states (DS) with specific phenotypes, while correcting for different covariates [54]. This approach is presented in various analysis pipelines [54], [56], [118], [119] and the main ones will be briefly discussed. Nowicka et al. [54] analysis pipeline called CyTOF workflow uses FlowSOM to cluster the data, followed by differential expression analysis to identify cell populations responding to the stimulation. Two different models are applied depending on the type of data: the General Linear Mixed Model (GLMM) and the Linear Mixed Model (LMM) for DA and DS respectively. In both cases the mixed model with random intercept is used to account for random effects caused by variations across the individuals. The general model is used for DA analysis to account for non-normal distribution in cell proportions when samples with lower cell-counts are present. cydar [119] detects diversely abundant cells by assigning them to overlapping ‘hyperspheres’ in the high dimensional space of markers. Cell counts and median marker expression are calculated within each hypersphere for each sample. Finally, the negative binominal generalized linear model from edgeR is used to test the differences between two groups. This model, similarly to GLMM, improves the estimation of dispersion parameters. Both pipelines provide flexibility in the adjustment of experimental covariates like batch, age or sex. However, only CyTOF workflow distinguishes between phenotypical and functional (e.g. phosphorylation state) markers making the analysis easier to interpret in the context of biological knowledge. Besides regression models, different machine learning approaches were successfully used to identify biomarkers. In the “ Flow Cytometry: Critical Assessment of Population Identification Methods” (FlowCAP) challenge IV [120], two pipelines, FloReMi.1 [121] and flowType-RchyOptimyx [122], provided statistically significant predictive values in the context of patient progression from HIV+ to AIDS. Both methods use flowType [123] to detect cell populations and apply random survival forest, using the ensemble of decision tree in FloReMi or dynamic programming together with graph theory in flowType-RchyOptimyx [124] to find the best gating hierarchy correlated with the clinical outcome. Citrus combines the cell classification obtained by hierarchical clustering with the automated selection of features based on a regularized classification model to associate the obtained features (cell percentages and median marker expressions) with the endpoint of interest. This algorithm was successfully used to identify cell subsets associated with AIDS-free survival [125]. However, as commented by Arvaniti et al. [126], a high amount of irrelevant events used as clustering input can result in either model overfitting or alternatively prevent rare cell detection. To address this issue, the authors developed CellCNN, an algorithm that applies convolutional neural networks in representation learning, making use of the sample classes during population identification. CellCNN is designed to detect rare populations with a frequency lower than 0.01% and was able to identify minor survival-associated cell populations in HIV-infected patients or spiked-in rare leukemic blast populations of two AML subclasses [126]. The use of regression or machine learning approaches are not mutually exclusive and their combination was presented by Krieg et al. [108]. They used FlowSOM together with GLMM to find relevant cell populations that distinguish responders to anti-PD1 therapy in metastatic melanoma patients. These features were further characterized using CellCNN algorithm. The comparison between different regression based methods was integrated in the package diffcyt and compared to other machine learning algorithms [56]. According to this report diffcyt outperforms other tools in rare population and differential state detection between two conditions. However, it is crucial to ensure the proper selection of clustering setting and regression method. None of the presented approaches is perfect, and their performance will depend on the biological question, type and volume of the data. It is important to consider what type of analysis is required [13]. Different approaches should be used when rare cell populations or activation of a particular known cell population are targeted. Results should be verified with at least two different algorithms, incorporating also traditional methods when verifying the outcome. Various ready-to-go R or python-coded analytical pipelines or user friendly interfaces are nowadays available, with no need for strong programming skills [54], [127], [128], [129]. Benchmarking that incorporates the newest algorithms and both FC and MC data should be organized in order to guide FC and MC users through different analytical approaches, pipelines, and algorithms.

Looking for the meaning: analysis of cytometry data – trajectory interference

Besides being a biomarker discovery platform, FC and MC are commonly used in the modeling of cell developmental stages with trajectory inference (TI) methods. These methods estimate for each cell a numeric value, called pseudotime, which orders the cells within the dynamic process of interest. This allows to define and study different transition stages [130]. The typical TI workflow comprises a dimensional reduction followed by a trajectory modeling using the tools described below. Most of the earlier algorithms were designed to model fixed topologies, such as one dimensional path, while currently bifurcating points, or tree-like structures can also be detected. Wanderlust was applied to reconstruct human B cell lymphopoiesis [131]. It is an example of one-path trajectory modeling, and was the first algorithm designed to study developmental stages using MC data. It is a graph-based method where each cell is represented as a node connected to its closest neighborhoods by the edges. To eliminate noise and possibility of introducing short circuits, multiple graphs and trajectories are built using random waypoint cells and l-out-of-k-nearest neighbors (l-k-NNG). The final position of the cell is the average over all graph trajectories. Two main assumptions are taken when using this tool: all cells that represent the non-branching differentiation pathway are present, and the changes in the marker expression are gradual. Therefore a proper marker selection is crucial. Wishbone [132] was used to track the development of T cells in the mouse thymus. It is an algorithm designed to detect bifurcating developmental trajectories. Similar to Wanderlust, Wishbone is based on k-NNG, where the shortest path between two cells is used as a distance metrics. However, as the bifurcating points are prone to build short circuits due to insufficient marker differences, instead of using subsampling subset of edges like in Wanderlust, Wishbone takes advantage of diffusion maps. Because of this, the major structure is kept in the first diffusion components, leaving out the trend to short circuit noise. The embedded space is used to construct k-NNG. In the case of Wishbone the waypoints have a double role: first, they allow to robustly order the cells along the trajectory, and secondly, together with spectral clustering, they provide information about the placement of waypoints on the same or on a different branch, thus giving the branching point information. The robustness of both Wanderlust and Wishbone depends on a user-defined starting cell, whereas p-Creode [133] constructs tree-like structure in an unsupervised manner. This algorithm introduces density pre-analysis downsampling, and is also based on graph theory using k-NNG construction with a density-based modification. After the construction of multiple trajectories a new metric called p-Creode scores is used to select the most representative trajectory. All the above mentioned methods were recently benchmarked using single cell RNA sequencing data [104]. This study provides useful guidelines for choosing proper TI methods. However, due to the different nature of cytometry and sequencing data, the outcome can be different. Therefore, it would be helpful to provide similar comparison using MC/FC data.

Conclusions

FC and MC are powerful high-dimensional technologies in single-cell biology. They are becoming important tools in biomarker discovery research, disease monitoring, and medical diagnostics. The rapid increase in dimensionality gives an opportunity to understand cell diversity in detail, narrow the research to fine cell populations, and by doing so, enable precision in the development of new therapies and biomarkers. However, dimensionality reduction and automated analysis require high-quality data, analytical skills, and powerful algorithms to meaningfully process the multidimensional space. As previously discussed, the design and execution of a good cytometry-based study is not a trivial process. Small details like changes in stocks, pipetting errors, shifts in machine performance, and improper data preprocessing can significantly contribute to data variation. Controlling for batch effects, although well adopted in transcriptomic data, is still inefficient and not often applied in MC and FC due to different data structures. It should be noted that inclusion of covariants like “batch effect” in the statistical model does not eliminate the bias introduced upon the clustering, and therefore batch effects should be corrected before data analysis, and ideally prevented when preparing the SOP. Many dimensionality reduction and clustering methods are available and they should be combined to verify and confirm results. To make an analysis accessible to non-programming researchers many packages bringing together various preprocessing and analysis tools that can be used in user-friendly interfaces Table 2. Hence, high-dimensional analysis can be available to both biologist and bioinformaticians. Since the single-cell high-dimensional era is just starting, it is important to take care when interpreting the data. Careful validation with multiple methods and standard approaches like traditional manual gating should be implemented in the analysis pipelines.

Author statement

PR wrote the original draft. MAR provided critical review of the manuscript. CM designed the work, supervised PR and revised the manuscript.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

120 in total

1. Diffusion maps for high-dimensional single-cell analysis of differentiation data.

Authors: Laleh Haghverdi; Florian Buettner; Fabian J Theis
Journal: Bioinformatics Date: 2015-05-21 Impact factor: 6.937

2. FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data.

Authors: Sofie Van Gassen; Britt Callebaut; Mary J Van Helden; Bart N Lambrecht; Piet Demeester; Tom Dhaene; Yvan Saeys
Journal: Cytometry A Date: 2015-01-08 Impact factor: 4.355

3. Early immunologic correlates of HIV protection can be identified from computational analysis of complex multivariate T-cell flow cytometry assays.

Authors: Nima Aghaeepour; Pratip K Chattopadhyay; Anuradha Ganesan; Kieran O'Neill; Habil Zare; Adrin Jalali; Holger H Hoos; Mario Roederer; Ryan R Brinkman
Journal: Bioinformatics Date: 2012-02-29 Impact factor: 6.937

4. Enhanced flowType/RchyOptimyx: a BioConductor pipeline for discovery in high-dimensional cytometry data.

Authors: Kieran O'Neill; Adrin Jalali; Nima Aghaeepour; Holger Hoos; Ryan R Brinkman
Journal: Bioinformatics Date: 2014-01-08 Impact factor: 6.937

5. flowAI: automatic and interactive anomaly discerning tools for flow cytometry data.

Authors: Gianni Monaco; Hao Chen; Michael Poidinger; Jinmiao Chen; João Pedro de Magalhães; Anis Larbi
Journal: Bioinformatics Date: 2016-04-10 Impact factor: 6.937

6. FloReMi: Flow density survival regression using minimal feature redundancy.

Authors: Sofie Van Gassen; Celine Vens; Tom Dhaene; Bart N Lambrecht; Yvan Saeys
Journal: Cytometry A Date: 2015-08-04 Impact factor: 4.355

7. destiny: diffusion maps for large-scale single-cell data in R.

Authors: Philipp Angerer; Laleh Haghverdi; Maren Büttner; Fabian J Theis; Carsten Marr; Florian Buettner
Journal: Bioinformatics Date: 2015-12-14 Impact factor: 6.937

8. Development of a Comprehensive Antibody Staining Database Using a Standardized Analytics Pipeline.

Authors: El-Ad David Amir; Brian Lee; Paul Badoual; Martin Gordon; Xinzheng V Guo; Miriam Merad; Adeeb H Rahman
Journal: Front Immunol Date: 2019-06-11 Impact factor: 7.561

Review 9. EuroFlow standardization of flow cytometer instrument settings and immunophenotyping protocols.

Authors: T Kalina; J Flores-Montero; V H J van der Velden; M Martin-Ayuso; S Böttcher; M Ritgen; J Almeida; L Lhermitte; V Asnafi; A Mendonça; R de Tute; M Cullen; L Sedek; M B Vidriales; J J Pérez; J G te Marvelde; E Mejstrikova; O Hrusak; T Szczepański; J J M van Dongen; A Orfao
Journal: Leukemia Date: 2012-09 Impact factor: 11.528

10. A Universal Live Cell Barcoding-Platform for Multiplexed Human Single Cell Analysis.

Authors: Felix J Hartmann; Erin F Simonds; Sean C Bendall
Journal: Sci Rep Date: 2018-07-17 Impact factor: 4.379

9 in total

Review 1. Analyzing high-dimensional cytometry data using FlowSOM.

Authors: Katrien Quintelier; Artuur Couckuyt; Annelies Emmaneel; Joachim Aerts; Yvan Saeys; Sofie Van Gassen
Journal: Nat Protoc Date: 2021-06-25 Impact factor: 13.491

2. Assessment of Tumor Heterogeneity in High-Grade Serous Ovarian Cancer: Mass Cytometry to Understand the Complex Tumor Biology.

Authors: Luca Pasquini; Roberta Riccioni; Eleonora Petrucci
Journal: Methods Mol Biol Date: 2022

Review 3. Breaking the Immune Complexity of the Tumor Microenvironment Using Single-Cell Technologies.

Authors: Simone Caligola; Francesco De Sanctis; Stefania Canè; Stefano Ugel
Journal: Front Genet Date: 2022-05-16 Impact factor: 4.772

4. Mass-tag barcoding for multiplexed analysis of human synaptosomes and other anuclear events.

Authors: Chandresh R Gajera; Rosemary Fernandez; Kathleen S Montine; Edward J Fox; Dunja Mrdjen; Nadia O Postupna; Christopher Dirk Keene; Sean C Bendall; Thomas J Montine
Journal: Cytometry A Date: 2021-04-05 Impact factor: 4.714

Review 5. Challenges in translational machine learning.

Authors: Artuur Couckuyt; Ruth Seurinck; Annelies Emmaneel; Katrien Quintelier; David Novak; Sofie Van Gassen; Yvan Saeys
Journal: Hum Genet Date: 2022-03-04 Impact factor: 5.881

6. CytofIn enables integrated analysis of public mass cytometry datasets using generalized anchors.

Authors: Yu-Chen Lo; Timothy J Keyes; Astraea Jager; Jolanda Sarno; Pablo Domizi; Ravindra Majeti; Kathleen M Sakamoto; Norman Lacayo; Charles G Mullighan; Jeffrey Waters; Bita Sahaf; Sean C Bendall; Kara L Davis
Journal: Nat Commun Date: 2022-02-17 Impact factor: 17.694

7. CyTOF mass cytometry analysis of human memory CD4⁺ T cells and memory B cells.

Authors: Lisa J Ioannidis; Andrew J Mitchell; Tian Zheng; Diana S Hansen
Journal: STAR Protoc Date: 2022-03-30

8. cyCombine allows for robust integration of single-cell cytometry datasets within and across technologies.

Authors: Christina Bligaard Pedersen; Søren Helweg Dam; Mike Bogetofte Barnkob; Michael D Leipold; Noelia Purroy; Laura Z Rassenti; Thomas J Kipps; Jennifer Nguyen; James Arthur Lederer; Satyen Harish Gohil; Catherine J Wu; Lars Rønn Olsen
Journal: Nat Commun Date: 2022-03-31 Impact factor: 14.919

9. Data processing workflow for large-scale immune monitoring studies by mass cytometry.

Authors: Paulina Rybakowska; Sofie Van Gassen; Katrien Quintelier; Yvan Saeys; Marta E Alarcón-Riquelme; Concepción Marañón
Journal: Comput Struct Biotechnol J Date: 2021-05-21 Impact factor: 7.271

9 in total