| Literature DB >> 30953498 |
Xiaoxin Ye1,2, Joshua W K Ho3,4,5.
Abstract
BACKGROUND: Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells.Entities:
Keywords: Clustering; DBSCAN; Flow cytometry; Single cell
Mesh:
Year: 2019 PMID: 30953498 PMCID: PMC6449887 DOI: 10.1186/s12918-019-0690-2
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1An illustrative example of the FlowGrid clustering algorithm. In this example, Bin 1, Bin 2, Bin 3 and Bin 6 are core bins as their Den are larger than MinDen (5 in this example), their Den are larger than MinDen (20 in this example), and their Den are larger than ρ% (75% in this example) of its directly connected bins. (ε=2 in this example), so Bin 1 and Bin 2 are directly connected. , so Bin 2 and Bin 4 are directly connected. Therefore, Bin 1, Bin 2 and Bin 4 are mutually connected, and they are assigned into the same cluster. Bin 5 is not a core bin but is a border bin, as it is directly connected to Bin 6, which is a core bin. Bin 3 is a outlier bin, as it is not a core bin nor a border bin. In practice, MinDen is set to be 3, MinDen is set to 40 and ρ is set to be 85
Comparison of runtime (in seconds) of FlowGrid against other clustering algorithms
| Data set | Samples | Markers | Cells | Time in second ( | |||
|---|---|---|---|---|---|---|---|
| FlowGrid | FlowSOM | FlowPeaks | Flock | ||||
| Multi-center | 16 | 8 | 29-77 ×103 | 0.23 ± 0.09 | 4.01 ± 1.08 | 2.27 ± 0.61 | 10.3 ± 3.45 |
| Flow-CAP-GvHD | 12 | 4 | 12-33 ×103 | 0.07 ± 0.04 | 2.16 ± 0.54 | 0.28 ± 0.16 | 0.58 ± 0.28 |
| Flow-CAP-DLBL | 30 | 3 | 2-25 ×103 | 0.04 ± 0.01 | 1.25 ± 0.32 | 0.10 ± 0.09 | 0.22 ± 0.16 |
| Flow-CAP-HSCT | 30 | 4 | 6-9 ×103 | 0.04 ± 0.02 | 1.35 ± 0.28 | 0.11 ± 0.02 | 0.28 ± 0.06 |
| Seaflow0 | - | 4 | 23.6 ×106 | 11.51 | 572.65 | NA | 6628.30 |
| Seaflow1 | - | 4 | 12.7 ×106 | 3.09 | 312.95 | 258.13 | NA |
| Seaflow11 | - | 4 | 22.7 ×106 | 6.37 | 544.79 | NA | NA |
NA represents that the algorithm got error in the data set
Comparison of accuracy (in ARI) of FlowGrid against other clustering algorithms
| Data set | ARI ( | |||
|---|---|---|---|---|
| FlowGrid | FlowSOM | FlowPeaks | Flock | |
| Multi-center | 0.66 ± 0.20 | 0.75 ± 0.17 | 0.68 ± 0.20 | 0.66 ± 0.16 |
| Flow-CAP-GvHD | 0.79 ± 0.15 | 0.85 ± 0.11 | 0.72 ± 0.16 | 0.47 ± 0.20 |
| Flow-CAP-DLBL | 0.85 ± 0.10 | 0.84 ± 0.10 | 0.82 ± 0.15 | 0.84 ± 0.09 |
| Flow-CAP-HSCT | 0.90 ± 0.08 | 0.87 ± 0.14 | 0.83 ± 0.24 | 0.57 ± 0.27 |
| Seaflow0 | 0.94 | 0.81 | NA | 0.27 |
| Seaflow1 | 0.59 | 0.54 | 0.34 | NA |
| Seaflow11 | 0.77 | 0.33 | NA | NA |
NA represents that the algorithm got error in the data set
Fig. 2Visual comparison of the clustering performance of FlowGrid, FlowPeaks, FlowSOM, and Flock using manual gating (top row) as the gold standard
Fig. 3Comparison of the runtime of FlowGrid, FlowPeaks, FlowSOM, and Flock using data sets with different number of cells
Fig. 4Sensitivity analysis of three different parameters on clustering accuracy (as measured by adjusted rand index; ARI) and runtime (seconds)