| Literature DB >> 32322369 |
Paulina Rybakowska1, Marta E Alarcón-Riquelme1,2, Concepción Marañón1.
Abstract
High-dimensional, single-cell cell technologies revolutionized the way to study biological systems, and polychromatic flow cytometry (FC) and mass cytometry (MC) are two of the drivers of this revolution. As up to 30-50 dimensions respectively can be measured per single-cell, they allow deep phenotyping combined with cellular functions studies, like cytokine production or protein phosphorylation. In parallel, the bioinformatics field develops algorithms that are able to process incoming data and extract the most useful and meaningful biological information. However, the success of automated analysis tools depends on the generation of high-quality data. In this review we present the most recent FC and MC computational approaches that are used to prepare, process and interpret high-content cytometry data. We also underscore proper experimental design as a key step for obtaining good quality data.Entities:
Keywords: Bioinformatics; Computational tools; Flow cytometry; Mass cytometry; Single-cell proteomics
Year: 2020 PMID: 32322369 PMCID: PMC7163213 DOI: 10.1016/j.csbj.2020.03.024
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Artifacts and their prevention in high-dimensional flow and mass cytometry data preparation and analysis.
| Source | Effect/Artifact | Prevention | |
|---|---|---|---|
| Experimental | Change in reagent’s batches | General change in protocol performance that can introduce batch effect. Not predicted contaminations (important in MC study);a) | Order enough quantity needed for the whole experiment if product stability allows it; |
| Change in the protocol e.g. a) use of fixed vs fresh blood; | a) Change in sample stability; | Decide staining approach before sample collection; | |
| Change in cocktail preparation: e.g. lack of one antibody, wrong fluorochromes, different clone | Different staining intensity; | Prepare one big cocktail of antibody, aliquot and store frozen (in MC), lyophilized or desiccated (MC and FC) | |
| Pipetting errors | Variation in staining intensity, especially problematic when MI will be compared; variation in the number of collected events | Barcode the samples and process them in the batches | |
| Improper antibody titration | Perform titration experiments | ||
| Unspecific staining | Unknown co-expression of markers, | Block FC receptors, | |
| Incorrect panel design | High spreading error that can mask dim populations | Use tools like Guided Panel Solution or Maxpar panel designer to design your panel; | |
| *Metal contamination (*Only in the case of MC) | Unspecific signal in .FCS file registered as events; | Troubleshooting will depends on the source of contamination: | |
| Acquisition | a) Improper cytometer calibration;b) | a) Loss in antigen resolution; | Check your machine performance and get to know its resolution; |
| Clogging of the device | Signal instability affecting median expression of the markers | Clean device when necessary to remove the clog | |
| High speed of acquisition or changing in acquisition speed | Change in doublets to singlets ratio; | Keep the speed constant, and adequate to obtain good singlets/doublets ratio | |
| Different sample or panel labeling | Errors when analyzing the files or inability to read-in the files | Define labeling strategy, create a template and keep it constant across all the project | |
| Analysis | Not enough statistical power | Lack of significance | Calculate statistical power |
| Improper transformation | Inability to distinguish positive and negative events; | Verify transformation method by visualizing markers e.g. using flowJo | |
| Batch effect | Improper data interpretation; | Visualize batch effect e.g. using dimensionality reduction tools like PCA or t-SNE; | |
| Improper normalization | Changes in marker expression distribution; | Carefully verify the performance of normalization tools by visualizing the markers | |
| Uncleaned changes in signal intensities | Improper assignment to the clusters | Spot problematic files by applying tools like AOF, normalize or discard them from the analysis | |
| Uncleaned bad quality events | Improper assignment to the cluster | Verify the signal stability and clean if necessary using tools like flowAI, flowClean, flowCut or manual gating | |
| Presence of doublets/high doublets to singlets ratio | Co-expression of the markers biologically incorrect; | Gate out all the doublets | |
| Improper Clustering or dimensional reduction performance | Unstable clusters or cell position, different results with every run of cluster | Verify clustering settings by gating few example files and calculating F1 score | |
Overview of bioinformatics tools for high-dimensional flow and mass cytometry data analysis.
| Application | Source | ||
|---|---|---|---|
| Panel design | Guided Panel Solution | Panel design in Flow Cytometry | BD Biosciences |
| Maxpar Panel Design | Panel design in Mass Cytometry | Fluidigm | |
| Quality control/preprocessing | Average overlap frequency (AOF) | Antibody performance evaluation | R package/Bioconductor package; |
| CATALYST | MC data preprocessing (bead-based normalization, debrcoding; compensation: FlowSOM clustering) | R package/Bioconductor package; | |
| flowAI | Signal Cleaning; | R package/Bioconductor package; | |
| flowClean | Signal Cleaning | R package/Bioconductor package; | |
| flowCore | Basic structures for flow cytometry data | R package/Bioconductor package; | |
| flowCut | Signal cleaning | R package (github repository) | |
| flowTrans | Data transformation | R package/Bioconductor package | |
| flowVS | Data transformation | R package/Bioconductor package | |
| flowWorkspace | Representation and interaction with gated and ungated data in R | R package/Bioconductor package | |
| Single-cell deconvolution algorithm | Debarcoding | CATALYST; Matlab; Fludigm stand-alone; Updated Single-cell Debarcoder | |
| Optimal Sample Assignment Tool (OSAT) | Sample to batch allocation | R package/Bioconductor package | |
| Normalization and batch effect correction | gaussNorm | Normalization | R package/Bioconductor package |
| fdaNorm | Normalization | R package/Bioconductor package | |
| CytoNorm | Normalization using reference sample | R package/Bioconductor package; Plugin FlowJo Exchange | |
| CytofBatchAdjust | Normalization using reference sample | R package/Bioconductor package | |
| BatchEffectRemoval | Normalization using reference sample | Python | |
| Dimensionality reduction | Diffusion Maps | Non-linear dimensionality reduction/Trajectory inference | R package/Bioconductor package; |
| Isomap | Non-linear dimensionality reduction/Trajectory inference | R package/CRAN package v | |
| PCA | Linear dimensionality reduction | R package | |
| t-SNE | Non-linear dimensionality reduction | R package/CRAN package; | |
| BH-SNE (viSNE) | Non-linear dimensionality reduction | Python; Cytobank; Matlab; R package/CRAN package | |
| UMAP | Non-linear dimensionality reduction | Python, R package/CRAN package | |
| One-SENSE | Non-linear dimensionality reduction | R package/Bioconductor package | |
| HSNE | Non-linear dimensionality reduction | Cytosplore | |
| FIt-SNE | Non-linear dimensionality reduction | R package; Matlab; Python; Plugin FlowJo Exchange | |
| EmbedSOM | Non-linear dimensionality reduction | R package/CRAN package ; Plugin FlowJo Exchange | |
| opt-tSNE | Non-linear dimensionality reduction | Python; Cloud opt-SNE | |
| Jensen-Shannon (JS) divergence | Dimensionality reduction comparison | R package/Bioconductor package | |
| Data clustering and automated gating | CytoCompare | Clustering comparison | R package (github repository) |
| flowClust | Unsupervised Clustering | R package/Bioconductor package; | |
| flowDensity | Supervised clustering | R package/Bioconductor package | |
| flowLearn | Semi supervised Clustering | R package (github repository) | |
| FlowSOM | Unsupervised Clustering | R package/Bioconductor package, Cytofkit | |
| flowType | Unsupervised Clustering | R package/Bioconductor package | |
| PhenoGraph | Unsupervised Clustering | Matlab; Python; R package | |
| SPADE | Unsupervised Clustering | R package/Bioconductor package; Cytobank, Matlab | |
| X-shift | Unsupervised Clustering | Standalone application (VorteX); Plugin FlowJo Exchange | |
| DensVM | Unsupervised Clustering | R package/Bioconductor package | |
| ACCSENSE | Unsupervised Clustering | Standalone ACCENSE application | |
| Useful pipelines and approaches | CellCNN | Representation learning approach to detect rare cell subsets associated with disease | Python |
| Citrus | Unsupervised clustering with regularized regression model | R package with GUI; Cytobank | |
| cydar | Unsupervised assignment to hyperspheres, control of the spatial false discovery rate, changes in abundance visualization | R package/Bioconductor package, | |
| Cytofast | Visual and quantitative analysis of cytometry data to discover immune signatures and correlations | R package/Bioconductor package | |
| Cytosplore | Interactive visual analysis system contain t-SNE, HSNE, SPADE | Interactive tool | |
| Cytofkit | Preprocessing; cell subset detection (DensVM, FlowSOM or Phenograph, ClusterX); data visualization (PCA, t-SNE, Isomap) | R package/Bioconductor package, | |
| diffcyt | Unsupervised clustering with FlowSOM, empirical Bayes moderated tests for statistical analysis | R package/Bioconductor package | |
| flowType/RchyOptimyx | Unsupervised clustering | R package/Bioconductor package | |
| FloREMI | Preprocessing; feature extraction; | R scripts (github repository) | |
| CyTOF workflow | Unsupervised clustering with FlowSOM, generalized linear mixed models or linear mixed models | R package/Bioconductor package | |
| DAMACY | Multivariate method based on PCA and multivariate regression based on Partial Least Squares (PLS) | Matlab | |
| OpenCyto | Facilitate the automated gating methods | Bioconductor package/GUI with shinyCyto application | |
| Trajectory detection | pCreode | Trajectory inference with multiple branching | Python |
| Wanderlust | Trajectory inference without branching | Matlab based interactive tool | |
| Wishbone | Trajectory inference with two branches | Python/Matlab based interactive tool | |
Fig. 1The flow and mass cytometry experimental and data analysis computational workflow. For more information about which tools to use and how to design well each step to avoid artifacts refer to Table 1, Table 2.