| Literature DB >> 31857895 |
Lukas M Weber1,2, Charlotte Soneson3,4.
Abstract
Benchmarking is a crucial step during computational analysis and method development. Recently, a number of new methods have been developed for analyzing high-dimensional cytometry data. However, it can be difficult for analysts and developers to find and access well-characterized benchmark datasets. Here, we present HDCytoData, a Bioconductor package providing streamlined access to several publicly available high-dimensional cytometry benchmark datasets. The package is designed to be extensible, allowing new datasets to be contributed by ourselves or other researchers in the future. Currently, the package includes a set of experimental and semi-simulated datasets, which have been used in our previous work to evaluate methods for clustering and differential analyses. Datasets are formatted into standard SummarizedExperiment and flowSet Bioconductor object formats, which include complete metadata within the objects. Access is provided through Bioconductor's ExperimentHub interface. The package is freely available from http://bioconductor.org/packages/HDCytoData. Copyright:Entities:
Keywords: Bioconductor; ExperimentHub; benchmarking; clustering; differential analyses; high-dimensional cytometry
Mesh:
Year: 2019 PMID: 31857895 PMCID: PMC6904983 DOI: 10.12688/f1000research.20210.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Summary of benchmark datasets for evaluating clustering algorithms.
For more details on these datasets, see Table 2 in 4, or the HDCytoData help files.
| Dataset | ExperimentHub
| Number
| Number of
| Number of
| Type of
| FlowRepository
| Original
|
|---|---|---|---|---|---|---|---|
| Levine_
| EH2240 – EH2241 | 265,627 | 32 | 14 | Manual gating | FR-FCM-ZZPH |
|
| Levine_
| EH2242 – EH2243 | 167,044 | 13 | 24 | Manual gating | FR-FCM-ZZPH |
|
| Samusik_
| EH2244 – EH2245 | 86,864 | 39 | 24 | Manual gating | FR-FCM-ZZPH |
|
| Samusik_
| EH2246 – EH2247 | 841,644 | 39 | 24 | Manual gating | FR-FCM-ZZPH |
|
| Nilsson_
| EH2248 – EH2249 | 44,140 | 13 | 1 (rare
| Manual gating | FR-FCM-ZZPH |
|
| Mosmann_
| EH2250 – EH2251 | 396,460 | 14 | 1 (rare
| Manual gating | FR-FCM-ZZPH |
|
Summary of benchmark datasets for evaluating methods for differential analyses.
For more details on these datasets, see Supplementary Note 1 in 5, or the HDCytoData help files.
| Dataset | ExperimentHub
| Type of data | Number
| Number of
| Type of
| Type of
| FlowRepository
| Original
|
|---|---|---|---|---|---|---|---|---|
| Krieg_Anti_
| EH2252 – EH2253 | Experimental | 85,715 | 24 (cell
| Qualitative | Differential
| FR-FCM-ZYL8 |
|
| Bodenmiller_
| EH2254 – EH2255 | Experimental | 172,791 | 24 (10 cell
| Qualitative | Differential
| FR-FCM-ZYL8 |
|
| Weber_AML_
| EH3025 – EH3046 | Semi-
| 157,593
| 16 (cell
| Spike-in
| Differential
| FR-FCM-ZYL8 |
|
| Weber_BCR_
| EH3047 – EH3064 | Semi-
| 85,331
| 24 (10 cell
| Spike-in
| Differential
| FR-FCM-ZYL8 |
|
Figure 1. Example of use case for datasets in the HDCytoData package.
This example compares three different dimension reduction algorithms — principal component analysis (PCA), t-distributed stochastic neighbor embedding (tSNE), and uniform manifold approximation and projection (UMAP) — for visualizing cell populations in the Levine_32dim dataset ( Table 1). Colors indicate the known ground truth cell populations.