| Literature DB >> 31815609 |
Joachim Ludwig1, Christian Höner Zu Siederdissen2, Zishu Liu1, Peter F Stadler3, Susann Müller1.
Abstract
BACKGROUND: Flow cytometry (FCM) is a powerful single-cell based measurement method to ascertain multidimensional optical properties of millions of cells. FCM is widely used in medical diagnostics and health research. There is also a broad range of applications in the analysis of complex microbial communities. The main concern in microbial community analyses is to track the dynamics of microbial subcommunities. So far, this can be achieved with the help of time-consuming manual clustering procedures that require extensive user-dependent input. In addition, several tools have recently been developed by using different approaches which, however, focus mainly on the clustering of medical FCM data or of microbial samples with a well-known background, while much less work has been done on high-throughput, online algorithms for two-channel FCM.Entities:
Keywords: Clustering; Data analysis; Expectation-Maximization; Flow cytometry; Microbial communities; Statistical analysis
Mesh:
Year: 2019 PMID: 31815609 PMCID: PMC6902487 DOI: 10.1186/s12859-019-3152-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Steps for removing noise and beads from the cytometric dot plot. a Technical noise removed by setting a parent gate in forward-scatter vs. side-scatter. b Technical noise removed by setting a parent gate in forward-scatter vs. DAPI fluorescence. c Beads removed in forward-scatter vs. DAPI fluorescence. d Cytometric dot plot only containing cells used as input for flowEMMi
Fig. 2Results of flowEMMi after subsampling and calculation of the BIC for the sample shown in Fig. 1 with separation of cell clusters and background clusters. Background clusters are not encircled and have a gray colour. a Curve of the BIC value shown for . bR dot plot with linear axes values from 0 to 65 536 containing only every 40 data point. c Clustering result of flowEMMi for c=13 calculated as the most appropriate number of clusters with 10 cell clusters and 3 background clusters. d Clustering result of flowEMMi for c=5 with 4 cell clusters and 1 background cluster. e Clustering result of flowEMMi for c=20 with 14 cell clusters and 6 background clusters
Comparison of running time without and with usage of the subsampling procedure. Mean values (mean) and standard deviations (SD) of the total running time and the number of iterations for were calculated based on three executions of flowEMMi, respectively
| Number of iterations for | Total running time (mm:ss) | |||
|---|---|---|---|---|
| mean | SD | mean | SD | |
| without subsampling | 228 | 83 | 24:11 | 00:24 |
| with subsampling | 102 | 31 | 05:31 | 00:34 |
Fig. 3Final result of flowEMMi using prior distribution parameters achieved from the subsamling procedure and an extended range of achieved from the BIC to find rare cell clusters. aR dot plot with linear axes values from 0 to 65 536 containing all data points. b Clustering result of flowEMMi for c=14 with 12 cell clusters and 2 background clusters. Background clusters are not encircled and have a gray colour
Comparison of clustering results from manual clustering performed by 5 experts using FlowJo and automated clustering using flowEMMi
| # clusters | Range of abundances (%) | Cell numbers (%) | # congruent clusters | ||
|---|---|---|---|---|---|
| Foreground | Background | ||||
| flowEMMi | 12 | 1.56 - 20.72 | 71.6 | 28.4 | 12 |
| User 1 | 13 | 0.25 - 27.7 | 76.5 | 23.5 | 10 |
| User 2 | 15 | 0.21 - 30.0 | 82.1 | 17.9 | 11 |
| User 3 | 13 | 0.24 - 28.2 | 79.1 | 20.9 | 11 |
| User 4 | 16 | 0.26 - 31.2 | 90.7 | 9.3 | 11 |
| User 5 | 15 | 0.22 - 32.1 | 91.6 | 8.4 | 11 |
Compared were i) the number of clusters that were found, ii) the range of the abundance values of all clusters, iii) the cell numbers of foreground/background cell and iv) the number of congruent clusters that were found by the user and flowEMMi, respectively. Congruent clusters are cell clusters having the same or similar mean values in both parameters (FSC and DAPI-Fluorescence)
Comparison of automated clustering approaches
| Tool | Running time (h:mm:ss) | Output features | |||
|---|---|---|---|---|---|
| Determine number of clusters | Shape of clusters | Separate background | Calculate cell numbers | ||
| flowEMMi | 0:05:31 | yes | ellipsoid | yes | yes |
| flowFP | 0:00:03 | no | rectangular | no | not applicable |
| SamSPECTRAL | 0:06:25 | space-part. | arbitrary | no | not applicable |
| flowDensity | 0:00:02 | no | rectangular | no | not applicable |
| flowMeans | 0:00:17 | space-part. | non-spherical | no | not applicable |
| flowClust | 1:15:30 | yes | ellipsoid | no | yes |
| flowMerge | (Table | yes | ellipsoid | yes | yes |
| FLAME | −∗ | −∗ | −∗ | −∗ | −∗ |
Automated approaches were compared regarding the running time and the abilities to identify rare cell types, to separate cell clusters from background clusters and to calculate the real cell numbers for each cell cluster. Running time calculated on a Intel(R) Core(TM) i5-3210M CPU @ 2.5 GHz with 4096MB RAM and Windows 7 Enterprise 64-Bit Edition. FLAME: “ −∗” denotes that no results were received as our submitted “jobs” were always in the queue for several days and later cancelled by the server. flowEMMi is the implementation discussed in this work. space-part. denotes k-means type algorithms that do not produce tight clusters
Fig. 4Results of clustering tools. a Result of flowEMMi. 12 cell clusters and 2 background cluster were identified. b Result of flowFP for 4 recursion = 16 clusters. c Result for SamSPECTRAL with adjusted parameters (σ=1 000, separation=0.3) and automatically determined number of clusters. d Result of flowDensity with overlapping densities. e Result of flowMeans with Voronoi like cluster shapes (MaxN=20). f Result of flowClust with automatically determined best number of clusters for (cf. detailed analysis of flowMerge in Table 4 and discussion)
Running times and F 1 score aggregated over experiments with different ε stopping criteria
| Label | Time (mean) | Time (sd) | F 1 score (mean) | F 1 (sd) | |
|---|---|---|---|---|---|
| flowEMMi | 1.0 | 528 | 53 | 0.56 | 0.18 |
| flowEMMi | 0.01 | 1 080 | 214 | 0.59 | 0.17 |
| flowEMMi | 10−5 | 1 445 | 182 | 0.56 | 0.17 |
| flowMerge | 1.0 | 8 391 | 3 239 | 0.54 | 0.24 |
| flowMerge | 0.01 | 8 951 | 3 597 | 0.51 | 0.17 |
| flowMerge | 10−5 | 56 652 | 53 379 | 0.54 | 0.17 |
Times and F 1 scores (and their standard deviation (sd)) are aggregated over four experiments and 5 expert user gatings, each. Note that the default flowMerge stopping criterion of 10−5 yields running times in excess of 1 day. flowEMMi consistently yields better F 1 measures with an average improvement of 4% to 16% over flowMerge, with much better running times, easily yielding speed improvements of ×8 – ×15 or better. For both algorithms, having a more stringent EM stopping criterion tends to increase the F 1 score, however especially for flowMerge at prohibitive running time costs