| Literature DB >> 20049163 |
Ali Bashashati1, Ryan R Brinkman.
Abstract
Flow cytometry (FCM) is widely used in health research and in treatment for a variety of tasks, such as in the diagnosis and monitoring of leukemia and lymphoma patients, providing the counts of helper-T lymphocytes needed to monitor the course and treatment of HIV infection, the evaluation of peripheral blood hematopoietic stem cell grafts, and many other diseases. In practice, FCM data analysis is performed manually, a process that requires an inordinate amount of time and is error-prone, nonreproducible, nonstandardized, and not open for re-evaluation, making it the most limiting aspect of this technology. This paper reviews state-of-the-art FCM data analysis approaches using a framework introduced to report each of the components in a data analysis pipeline. Current challenges and possible future directions in developing fully automated FCM data analysis tools are also outlined.Entities:
Year: 2009 PMID: 20049163 PMCID: PMC2798157 DOI: 10.1155/2009/584603
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Two-dimensional sequential gating example. (a) Operator selects a subset of “interesting” events (shown within the ellipsoid region), (b) Selected events in (a) are observed and further analyzed using other dimensions of the data. The axes represent different parameters representing physical and chemical characteristics of the analyzed cells.
Figure 2Proposed FCM data analysis framework.
Summary of survey (M: manual; Y: yes; E: embedded in gating; U: unsupervised; S: supervised; “—”: not supported, not implemented, not applicable; “||”: same as above). Note that this table does not report Quality Assessment, Normalization, and Feature Extraction components.
| Paper | Outlier removal | Automated gating | Labelling | Interpretation (classification/ comparison of samples) | |||
|---|---|---|---|---|---|---|---|
| Method | Supervised/ Unsupervised | Multidimensional | Automated # of clusters | ||||
| [ | Logical and cleaning morphological operators applied to the corresponding image representation of FCM data | Logical operation on image representation of FCM data followed by thickening | U | — | — | Based on location and abundance of populations | — |
| || | Majority operator applied to the image representation of FCM data followed by Soble edge detection | U | — | — | || | — | |
| || | Zero-degree B-Spline smoother applied to the 2-dimenisonal FCM data followed by break point detection | U | — | — | || | — | |
| || | Gath-Geva fuzzy clustering | U | — | — | || | — | |
|
| |||||||
|
[ | Embedded in clustering (cluster membership weights can be used to exclude outliers) | Gaussian Mixture Models | U | Y | M | — | |
| Y | |||||||
| (using BIC) | |||||||
|
| |||||||
|
[ | Embedded in clustering (cluster membership weights can be used to exclude outliers) | t-Mixture Models | U | Y | M | — | |
| Y | |||||||
| (using BIC) | |||||||
|
| |||||||
| [ | Embedded in clustering (excluding events that are far from Gaussian functions centers using a predefined cutoff value) | Mahalanobis distance from centroids of multivariate Gaussian functions used for classification task | S | — | — | E | — |
|
| |||||||
| [ | — | Multilayer perceptron (MLP) | S | Y | — | E | — |
|
| |||||||
| [ | — | Building templates for automated gating by using a cluster-finding algorithm (Beckton Dickinson's (BD) snap-to gate algorithm) | U | — | — | E (initially set by operator) | — |
|
| |||||||
| [ | — | DKLL (an extension of the | U | Y | — | — | — |
| — | Fuzzy | U | Y | — | — | — | |
| — | Fuzzy | U | Y | — | — | — | |
| — | Fuzzy | U | Y | — | — | — | |
| — | Fuzzy | U | Y | — | — | — | |
|
| |||||||
| [ | — | M | — | — | — | M | Complete linkage hierarchical clustering |
|
| |||||||
| [ | — | — | — | — | — | — | Comparing sample to a reference sample by probability binning algorithm |
|
| |||||||
| [ | — |
| U | Y | Histogram feature guided | — | — |
| Partition index guided | — | ||||||
|
| |||||||
| [ | — | Frequency difference gating approach (defines a gate(s) that contains statistically significant more events in the test sample than the control sample)1 | U | Y | — | — | — |
|
| |||||||
| [ | — | MLP | S | Y | — | E | — |
| — | Learning vector quantization (LVQ) | S | Y | — | E | — | |
| — | Radial basis function (RBF) | S | Y | — | E | — | |
| — | Asymetric RBF | S | Y | — | E | — | |
| — | Classification by modeling each class with Gaussian distributions | S | Y | — | E | — | |
| — |
| S | Y | — | E | — | |
| — | Kohonen's self organizing map (SOM) | U | Y | — | M | — | |
|
| |||||||
| [ | — | Static gates applied to data | U | — | — | E (initially set by operator) | CLASSIF1 approach [ |
|
| |||||||
| [ | — | Building templates for automated gating by using a cluster-finding algorithm (BD Snap-to gate algorithm) | U | — | — | E (initially set by operator) | — |
|
| |||||||
| [ | — | M | — | — | — | M | Functional linear discriminant analysis |
|
| |||||||
| [ | — | Building templates for automated gating by using a cluster-finding algorithm (BD's snap-to gate algorithm) | U | — | — | E (initially set by operator) | — |
|
| |||||||
| [ | — | Gaussian Mixture Models | U | — | M | M | — |
|
| |||||||
| [ | — | M | — | — | — | M | Average-linkage hierarchical clustering |
|
| |||||||
| [ | — | M | — | — | — | M | Classification based on a semantic network of knowledge base through a hierarchical tree (if-then rule mechanism) |
|
| |||||||
| [ | — |
| U | Y | — | M | — |
| — | Calculating modes of density function (calculated by Kernel density estimation ) followed by nearest neighbour heuristic | U | Y | — | M | — | |
| — | Gaussian mixture models using Markov chain Monte Carlo (MCMC) | U | Y | — | M | — | |
|
| |||||||
| [ | — | Building templates for automated gating by using a cluster-finding algorithm (BD's snap-to gate algorithm) | U | — | — | E (initially set by operator) | — |
|
| |||||||
| [ | — | Automated gating using BD Simulset software | — | — | — | M | Correlation tests using Spearman's method |
|
| |||||||
| [ | — | Image representation of randomly selected events from a group of flow data followed by smoothing, regional maxima detection and watershed algorithm to define the gates to apply to all the data | U | — | — | — | — |
|
| |||||||
| [ | — | SOM | U | — | — | M | — |
| — | Cluster analysis with Winlist (Verity Software House, USA)) | U | — | — | || | — | |
|
| |||||||
| [ | — | Static gates applied to data and self adjusting gates (details not mentioned) for lymphocytes, monocytes, and granulocytes | U | — | — | E (initially set by operator) | CLASSIF1 approach [ |
|
| |||||||
| [ | — | Fcom tool (an analysis tool in Winlist (Verity Software House, USA)) | — | — | — | M | Average- linkage hierarchical clustering |
|
| |||||||
| [ | — | Static gates applied to data and self adjusting gates for lymphocytes, monocytes, and granulocytes | U | — | — | E (initially set by operator) | CLASSIF1 approach [ |
|
| |||||||
| [ | — | M | — | — | — | M | “Professor Fidelio” (a heuristic classification system that reasons on the basis of defined diagnostic patterns [ |
|
| |||||||
| [ | — |
| U | — | — | M | — |
| — |
| U | — | — | M | — | |
| — | Preclustering a subset of the data by | U | — | — | M | — | |
|
| |||||||
| [ | E (excluding the events that were more than a set number of standard deviations away from the centroids of the clusters) |
| U | Y | — | M | — |
|
| |||||||
| [ | — | MLP | S | Y | — | E | — |
|
| |||||||
| [ | — | RBF | S | Y | — | E | — |
|
| |||||||
| [ | — | MLP | S | Y | — | E | — |
| — | SOM | U | Y | — | M | — | |
| E (excluding the events that were more than a set number of standard deviations away from the centroids of the cluster) |
| U | Y | — | M | — | |
|
| |||||||
| [ | — | No gating—mean fluorescent intensities of antibodies were used for next stage of analysis | — | — | — | — | MLP |
|
| |||||||
| [ | — | RBF | S | Y | — | E | — |
|
| |||||||
| [ | — | — | — | — | — | — | Histogram of one parameter of FCM data followed by MLP |
|
| |||||||
| [ | — | Classification and regression trees (CARTs) | S | Y | — | E | — |
|
| |||||||
| [ | — | Support vector machine (SVM) | S | Y | — | E | — |
| — | RBF | S | Y | — | E | — | |
|
| |||||||
| [ | — | RBF using radially symmetric basis function (based on Euclidean distance) | S | Y | — | E | — |
| — | RBF using more general arbitrarily oriented ellipsoidal basis functions (based on Mahalanobis distance) | S | Y | — | E | — | |
|
| |||||||
| [ | — | Gaussian mixture model clustering | U | Y | — | — | — |
|
| |||||||
| [ | Embedded in clustering (excluding events that are far from Gaussian functions centers using a predefined cutoff value) | Mahalanobis distance from the centroids of multivariate Gaussian functions used for classification task | S | — | — | E | — |
|
| |||||||
| [ | — | M | — | — | — | — | Classification based on a shrunken centroids approach [ |
| Hierarchical clustering | |||||||
|
| |||||||
| [ | — | M | — | — | — | — | Kernel density estimation followed by calculating differences between patients by Kulback-Leibler divergence to form a similarity matrix and then dimensionality reduction by multidimensional scaling for 2-dimensional visualization |
|
| |||||||
| [ | — | RBF | S | Y | — | E | — |
|
| |||||||
| [ | — | M | — | — | — | — | Hierarchical clustering |
| Principal component analysis (PCA) for dimensionality reduction and visualization to see if classes are separable by looking at the first few principle components | |||||||
1Closely related to probability binning algorithm introduced in [52].
2This study utilizes quality assessment strategy introduced in [42] that is based on comparison of density, ECDF (empirical cumulative distribution function), box plots, and two types of bivariate plots of similar samples.
Figure 3Percentages of studies that address different data analysis components according to the proposed framework. Note that cluster labeling approaches that are embedded in gating stage are counted in the “Cluster Labeling” entry.
Figure 4(a) and (b) Example of cases where flow cytometer voltage changes have caused in a shift in the absolute position of the populations within the ellipsoid gates.