| Literature DB >> 28158442 |
Hao-Chih Lee1,2, Roman Kosoy1, Christine E Becker1,2, Joel T Dudley1,2, Brian A Kidd1,2.
Abstract
MOTIVATION: Recent advances in mass cytometry allow simultaneous measurements of up to 50 markers at single-cell resolution. However, the high dimensionality of mass cytometry data introduces computational challenges for automated data analysis and hinders translation of new biological understanding into clinical applications. Previous studies have applied machine learning to facilitate processing of mass cytometry data. However, manual inspection is still inevitable and becoming the barrier to reliable large-scale analysis.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28158442 PMCID: PMC5447237 DOI: 10.1093/bioinformatics/btx054
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1ACDC algorithm design and validation. (A) Schematic diagram showing the work flow of ACDC. (B) Heat maps showing the average marker intensity of landmark points and manually gated populations from the AML dataset. (C) tSNE visualization of landmark points (large circles) and manually gated populations (dots)
Fig. 2Validation on AML and BMMC datasets. (A, E) Classification accuracy of ACDC (yellow bars), score-based classification (purple bars), and phenograph clustering (gray bars) evaluated by F1-score. (B, F) Silhouette coefficients of manually gated populations show cluster tightness. (C, G) Comparison of population frequencies estimated by the 3 methods versus manual gating (green bars). (D, H) Errors in estimating population frequencies. Error bars reflect the standard deviations of the accuracy estimates from the cross-validation trials described in Section 2.3.4
Fig. 3Validation on PANORAMA dataset. (A) Frequencies of cellular populations estimated by manual gating (green bars), ACDC (yellow bars), scored-based classification (purple bars) and phenograph clustering (gray bars). All events excluded by manual gating were labeled ‘unknown.’ (B) Per-cell type Pearson correlations over 10 replications. (C) Average F1-scores over 10 replications. Error bars represent standard deviations
Fig. 4Illustration of selected unknown clusters. (A) Two-dimensional heatmap shows the profile of an unknown cluster sharing features of CD8+ T cells, IgD + IgM+ B cells and gamma-delta T cells (rows shown below). Colors reflect the marker intensity. (B) Heatmap indicates the profile of an unknown cluster sharing features of CD4+ T cells and IgD + IgM+ B cells (rows shown below). The top-3 similar canonical populations are shown right below the unknown cluster
Computational performance of ACDC
| Accuracy (%) | Time (s) | Events | |||||
|---|---|---|---|---|---|---|---|
| k-nn | 10 | 20 | 30 | 10 | 20 | 30 | |
| BMMC | 92.02 | 92.24 | 92.49 | 245 | 309 | 376 | 81747 |
| AML | 98.36 | 98.30 | 98.25 | 884 | 992 | 1077 | 103184 |