| Literature DB >> 35046945 |
Zicheng Hu1,2, Sanchita Bhattacharya1, Atul J Butte1.
Abstract
Modern cytometry technologies present opportunities to profile the immune system at a single-cell resolution with more than 50 protein markers, and have been widely used in both research and clinical settings. The number of publicly available cytometry datasets is growing. However, the analysis of cytometry data remains a bottleneck due to its high dimensionality, large cell numbers, and heterogeneity between datasets. Machine learning techniques are well suited to analyze complex cytometry data and have been used in multiple facets of cytometry data analysis, including dimensionality reduction, cell population identification, and sample classification. Here, we review the existing machine learning applications for analyzing cytometry data and highlight the importance of publicly available cytometry data that enable researchers to develop and validate machine learning methods.Entities:
Keywords: cyTOF; cytometry; flow cytometry; machine learning; predictive modeling
Mesh:
Year: 2022 PMID: 35046945 PMCID: PMC8761933 DOI: 10.3389/fimmu.2021.787574
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Selected machine learning methods for cytometry analysis.
| Machine learning type | Name | Desciption |
|---|---|---|
| Dimentionality reduction | PCA | PCA projects the high-dimensional data into lower dimensions while preserving as much of the data's variation as possible. |
| MDS | MDS projects the high-dimensional data into lower dimensions while preserving as much of the pairwise distances between the cells. MDS and PCA are equivalent when the Euclidean distance is used. | |
| tSNE | t-SNE (t-distributed stochastic neighbor embedding) is a non-linear dimensionality reduction method. t-SNE transforms the pairwise distances into probabilities based on t-distribution, thus emphasizing preserving the data's local structure. | |
| UMAP | UMAP (Uniform Manifold Approximation and Projection) is a method for dimension reduction using manifold learning techniques. Similar to tSNE, UMAP emphasis preserving the local structure of the data. | |
| Unsupervised methods for cell population identification | FLOCK | FLOCK identify cell populations using density-based clustering. |
| flowSOM | FlowSOM maps cells to self-organizing maps and uses consensus hierarchical clustering to identify the cell populations. | |
| flowMeans | flowMeans uses K-means clustering a change point detection algorithm to identify cell populations. | |
| flowMerge | FlowMerge first uses Gaussian mixture models to identify cell subsets from the cytometry data and uses entropy-based criteria to merge the closely related cell population. | |
| MetaCyto | MetaCyto uses a combination of hierarchical clustering and cell population labeling to identify shared cell populations across studies. | |
| SWIFT | Swift uses a Gaussian mixture model-based clustering method to identify cell subsets, followed by splitting and merging steps to adjust the number of clusters to identify rare subpopulations | |
| PhenoGraph | PhenoGraph first constructs a nearest neighbor graph of the single cells based on their phenotypic similarity and then partition the graph into clusters using a community detection algorithm. | |
| Supervised methods for cell population identification | LDA for cytometry data | The method train a linear discriminant analysis (LDA) classifier to identify cell populations |
| DGCyTOF | DGCyTOF trains a deep learning model to identify cell populations. A feedback loop is included to adjust between new and unknown cell populations. | |
| DeepCyTOF | DeepCyTOF trains a deep learning model to identify cell populations. DeepCyTOF includes a calibration step to adjust for batch effects between datasets. | |
| Sample classification using cell subset information | CITRUS | CITRUS uses hierarchical clustering to identify a large number of small cell subsets from cytometry data and uses a LASSO model to predict clinical outcomes. |
| FloReMi | FloReMi is a pipeline for data preprocessing, cell subset identification, feature selection, and predictive modeling of cytometry data. FloReMi uses a Random Forest model to predict clinical outcomes using cell subset information. | |
| Sample classification using single-cell data | CellCNN | CellCNN adopted a convolutional neural network structure to predict clinical or biological outcomes directly using single-cell data from cytometry experiments. |
| Deep CNN | The Deep CNN model uses a convolutional neural network structure to predict clinical or biological outcomes directly using single-cell data from cytometry experiments. The model includes a higher number of internal layers, allowing the model to better capture the complex interactions between cell marks in the cytometry data. |
Figure 1Schematic diagrams showing the machine learning approaches used to annotate cell population from cytometry data or classifying the cytometry samples.