Literature DB >> 28158442

Automated cell type discovery and classification through knowledge transfer.

Hao-Chih Lee^1,2, Roman Kosoy¹, Christine E Becker^1,2, Joel T Dudley^1,2, Brian A Kidd^1,2.

Abstract

MOTIVATION: Recent advances in mass cytometry allow simultaneous measurements of up to 50 markers at single-cell resolution. However, the high dimensionality of mass cytometry data introduces computational challenges for automated data analysis and hinders translation of new biological understanding into clinical applications. Previous studies have applied machine learning to facilitate processing of mass cytometry data. However, manual inspection is still inevitable and becoming the barrier to reliable large-scale analysis.
RESULTS: We present a new algorithm called utomated ell-type iscovery and lassification (ACDC) that fully automates the classification of canonical cell populations and highlights novel cell types in mass cytometry data. Evaluations on real-world data show ACDC provides accurate and reliable estimations compared to manual gating results. Additionally, ACDC automatically classifies previously ambiguous cell types to facilitate discovery. Our findings suggest that ACDC substantially improves both reliability and interpretability of results obtained from high-dimensional mass cytometry profiling data.
AVAILABILITY AND IMPLEMENTATION: A Python package (Python 3) and analysis scripts for reproducing the results are availability on https://bitbucket.org/dudleylab/acdc . CONTACT: brian.kidd@mssm.edu or joel.dudley@mssm.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Substances：
Biomarkers

Year: 2017 PMID： 28158442 PMCID： PMC5447237 DOI： 10.1093/bioinformatics/btx054

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

High-throughput, high-dimensional cytometry is one of the most valuable tools for basic and clinical immunology. Advances in this technology over the last decade now provide simultaneous measurements of dozens of proteins at single-cell resolution (Bandura ; Spitzer and Nolan, 2016). Mass cytometry by time-of-flight (CyTOF) provides a powerful new tool for studying cellular diversity and dynamics by measuring up to 50 markers per cell. Many recent studies highlight the utility of CyTOF for enabling novel discovery and understanding in multiple domains of immunology, including mapping cell subset heterogeneity and specificity in response to various pathogens (Newell , 2013), precise elucidation of cellular networks and biochemical pathway activation following drug perturbation (Bendall ; Bodenmiller ), as well as new understanding of cellular trafficking and tissue localization (Wong et al., 2016a, b). However, the high number of measures and complexity of the resulting data restrict manual exploration and present challenges for both the analysis and biological interpretation of CyTOF data (Newell and Cheng, 2016). New tools that automate the data analysis are needed to realize the full potential of CyTOF for biological discovery and translational applications. A number of studies have focused on applying or developing algorithms to address the data analysis and interpretation challenges arising from CyTOF data. One early approach applied machine learning techniques to detect clusters of similar immune cell types in high dimensional space (Aghaeepour ; Qiu ). More recently, researchers have used network analysis techniques to assist the identification of known and novel cell populations (Levine ; Samusik ; Shekhar ). In concert with these analytical advances, a number of studies have developed software tools to organize and visualize the high-dimensional cytometry data (Amir ; Shekhar ; Van der Maaten and Hinton, 2008). Yet, to date, the available computational tools still require substantial manual manipulation to extract biological findings and interpret the data. These manual steps create a major limitation for exploring the full dataset and taking advantage of the large number of markers in CyTOF. One of the biggest challenges for interpreting mass cytometry data is how best to annotate individual cells with canonical cell types. This difficulty arises from (i) uncertainty in defining cell types based on more than a handful of markers and (ii) the absence of biological information as an input for machine learning techniques. Current approaches require substantial manual inspection that impedes the analysis workflow, underutilizes the full value of the high-dimensional data, and ultimately reduces the scientific insights that can be gained from each study. Here we address the cell annotation challenge through a novel computational method that greatly facilitates the organization and interpretation of mass cytometry data through automated transfer of biological knowledge. Our method automates cell annotation by using biological knowledge as an input parameter to a novel machine learning approach: Automated Cell-type Discovery and Classification (ACDC). ACDC provides enhanced visualization and automated classification of canonical cell populations, as well as augments the discovery of novel populations from mass cytometry data. ACDC represents a new framework that seamlessly integrates all the pieces to automate the process for estimating occurrences of canonical cell populations. We evaluated ACDC using three benchmark datasets (AML (Levine ), BMMC (Bendall ; Levine ) and PANORAMA (Samusik ), for which manual gating information was available to provide a ‘ground truth’ reference.

2 Methods

Annotating individual cells requires reconciling the vast amounts of single cell information collected through high-throughput cytometry with our prior knowledge. To illustrate this point, it is well established that a CD4+ T-cell is identified based on high levels of CD3 and CD4 and simultaneously having low expression level of CD8. We designed ACDC to take advantage of the biological knowledge that humans have accumulated and integrate this information with machine learning algorithms to automate the annotation of mass cytometry data. To combine our prior biological frameworks with new data, the ACDC approach involves two steps (Fig. 1A and Supplementary Fig. S1). First, ACDC converts a user-specified table of markers and cell labels into landmark points that represent fingerprints for specific cell types in the high-dimensional space. Second, ACDC implements semi-supervised classification via random walks (Grady, 2006) to collect information from all the landmark points and classify events at the single-cell resolution. With ACDC, prior knowledge of canonical cell types is explicitly encoded in the user-specified table, transformed into landmark points and eventually fed into a semi-supervised learning algorithm. We summarize the workflow of ACDC in the following:

Fig. 1

ACDC algorithm design and validation. (A) Schematic diagram showing the work flow of ACDC. (B) Heat maps showing the average marker intensity of landmark points and manually gated populations from the AML dataset. (C) tSNE visualization of landmark points (large circles) and manually gated populations (dots) Inputs: measured mass cytometry events and a user-specified table of markers to cell types. Generate landmark points by score matching and un-supervised clustering. (Section 2.1) Classify single-cell events by semi-supervised learning. (Section 2.2) Study design and evaluations are presented in Section 2.3.

2.1 Generate landmark points

2.1.1 Design of cell type-marker table

A cell type-marker table is a data matrix whose value is either 1 (present, +), -1 (absent, -) or 0 (do not consider), where is the jth cell type and is the kth marker (Supplementary Table S1–S3). The cell type-marker table allows users to customize cell types to be detected by linking these canonical cell types to their marker profiles. For example, CD4+ T-cells are known to have high expression level of the surface markers CD3 and CD4 and low expression level of CD8. Therefore, CD4+ T-cells are described as CD3+/CD4+/CD8- cells. As another example, B-cells can be referred to as CD19+/CD3-cells. ACDC converts the user specified cell type-marker table into landmark points in the high-dimensional space.

2.1.2 Design of the score function

We designed the score function to match a mass cytometry event with a single cell type. Intuitively, the chance a measured event belongs to a canonical cell type is determined by the extent that the intensity profile of a cluster matches one of the pre-specified profiles. We formulated the degree of matching as the posterior probability that a marker is in the activated/inactivated state. To be precise, we first fit a two-mode Gaussian mixture model to the kth marker’s intensity distribution. While the marker intensity is one dimensional, we identified the mode of high/low intensity as the activated/inactivated state of this marker. The score of assigning an event to a cell type is then defined by where is the posterior probability of the kth marker is in state and is the intensity of the kth marker in an event . The minimum is taken over all specified markers to ensure that all requirements are satisfied. In practice, cell types specified by a user might not be exhaustive. To detect those unspecified cells, we added an ‘unknown’ type whose score is defined by This quantity represents the level of uncertainty in our current knowledge since its high value indicates the low probability of assigning any specified cell types to the event . Though can be directly evaluated by the Gaussian mixture model, such posterior probability might not be monotonic if the Gaussian mixture model has modes of unequal variances. We instead used an approximated function where is the critical point that and is the slope of the posterior probability at this critical point. Both and can be computed from the means and variances of the two-mode Gaussian mixture model.

2.1.3 Unsupervised clustering

Community detection (Girvan and Newman, 2002) was used due to its superior performance in clustering mass cytometry data (Levine ). The community detection aims to find a set of assignments that maximize the modularity Q defined by where is the weights between ith node and jth node, and . is the Krnoecker delta function that takes values of 1 when and 0 otherwise. is the community assignment of ith node. We used the recommended setting to generate the weight matrix based on 30-nearest neighbor graph and Jaccard similarity (Levine ).

2.1.4 Landmark point generation

To generate landmark points, we partitioned the whole dataset into subsets . Landmark points were defined as the centers of clusters identified by community detection in each subset.

2.2 Single-cell classification by semi-supervised learning

2.2.1 Classification by random walkers

We implemented semi-supervised classification via random walks (Grady, 2006) for classifying events at the single-cell resolution. Briefly, semi-supervised classification via random walks evaluates the probability that a data point belongs to class as the chance of a random walker, starting from the data point , first reaches a landmark point of class when navigating the network. Theoretical derivation shows this probability satisfies the Laplace equation, i.e. with the boundary conditions if a landmark point of class and if a landmark point of other classes. The numeric value of at every data point can be solved as a boundary value problem. In our implementation, we used 10-nearest neighbors to construct such a data network.

2.2.2 Processing experiments with multiple replicates

A common experimental design with mass cytometry data is to measure multiple biological examples of a particular type (e.g. organism, tissue, treatment condition) in one experiment. To classify data from these replicate samples on a common basis, we computed a common set of landmark points using pooled data of all replications and then classify each replication independently with the same landmark points. Cell frequencies were then estimated by counting the classification results.

2.3 Study design and benchmarking

2.3.1 Validation datasets

We used three public benchmark datasets. BMMC dataset is a mass cytometry dataset collected from healthy human bone marrow (Bendall ). While 34 parameters were originally measured, the publically available dataset reduced to only 13 markers, and the resulting dataset included 24 populations gated based on these markers (Levine ). AML dataset is also collected from healthy human bone marrow (Levine ), and consists of 32 markers and 14 manually gated classes. PANORAMA dataset is a recently published dataset that provides replicative measurements of mass cytometry data from mice, where 24 cellular populations were gated based on 38 surface markers (Samusik ). Three experts independently gated the cellular populations in the PANORAMA dataset and only the consensus part of the gating was retained. All event measurements were transformed by before further processing (Samusik ). Cell type-marker tables were generated according to previous studies (Bendall ; Levine ; Samusik ). The cell type–marker tables of the BMMC and AML dataset were generated based on their gating hierarchy provided on Cytobank (Supplementary Table S1 and S2). In BMMC dataset, erythroblast, megakaryocyte platelet and myelocyte were merged as an unknown population since negative markers exclusively define these cells. For the PANORAMA dataset, the cell type-marker table was generated based on the divisive marker tree with minor changes (Samusik ) (Supplementary Table S3). We excluded HSC cells and pro B cells as unknown types since their defining markers cannot be determined from the reported divisive marker tree.

2.3.2 Baseline methods

We implemented (i) score-based classification; and (ii) phenograph clustering (Levine ) for performance benchmarking. The score-based classification assigns event to the class that maximizes the score, i.e. where is the designed score function. For the phenograph clustering, data was first clustered by community detection and then all events within a cluster were assigned to a manually gated cell type of highest frequency in this cluster. This method was implemented as a counterpart of estimating population frequencies by unsupervised clustering.

2.3.3 Evaluation metrics

We applied three metrics to evaluate the performance on estimating cellular population frequencies. Given two normalized histograms and , generated by counting the number of each cellular category classified either manually or automatically, the maximum error is computed by taking maximum of absolute errors on all components. To be precise, the maximum error is defined by where and are ith elements of histograms and , respectively. The Canberra distance is defined by This distance is chosen to estimate the capability of capturing rare populations since it gives higher penalty on the low-frequency populations. Lastly, the intersection distance, defined by measures the difference between the common area underlying two histograms and 1, which is the largest possible common area. The intersection distance reflects the accumulative errors in all populations. The accuracy of classifying single-cell events is measured by the F1-score, which reflects the harmonic mean of precision (purity) and recall (yield), where is the number of events classified as population i that belongs to the manually gated population j.

2.3.4 Confidence estimation

For validation on AML and BMMC datasets, the confidence level was estimated using 5-fold cross validation while keeping the percentage of samples for each class unchanged. For the PANORAMA dataset, confidence level was estimated as the standard deviation over samples.

2.3.5 Measuring tightness of clusters

We used the silhouette coefficient to measure the tightness of a given cluster (Rousseeuw, 1987). The silhouette coefficient measures how similar a datum is to its own cluster compared to the other clusters. For the ith datum, silhouette coefficient of this datum is defined as where is the average Euclidean distance from this datum to other members of the same cluster, and is the lowest average distance from this datum to members of other clusters. The silhouette coefficient ranges from -1 to 1 while a negative silhouette coefficient indicates a datum is closer to other clusters than its own cluster.

3 Results

3.1 ACDC helps visualization of mass cytometry data

To test whether the detected landmark points represent the corresponding cellular populations, we first applied ACDC to the AML and BMMC datasets. In the AML dataset, ACDC identified every population highlighted in the study and showed virtually no difference with manual gating (Fig. 1B). The one exception was a population of CD34 + CD38 + CD123+ HSPCs that showed a lower average intensity of CD123 in ACDC than with manually gating. To examine how landmark points depicted cellular populations, we used tSNE (Van der Maaten and Hinton, 2008) to map cellular measurements sampled from the manually gated populations onto a two-dimensional space and displayed the detected landmark points in their respective coordinates (Fig. 1C). The tSNE projection also supports the observation that landmark points detected by ACDC fall within their corresponding cluster of cells. We found similar results in the BMMC dataset (Supplementary Fig. S2). These results confirm that landmark points can locate cellular populations as accurate as manual gating.

3.2 ACDC classifies canonical cell populations as accurate as human experts

Although landmark points aid the exploratory analysis of mass cytometry data, the focus of this study was to evaluate whether landmark points classify events accurately at single-cell resolution. For comparison, we implemented two alternative classification methods: (i) a score-based classification that assigns an event to the class of the highest score and (ii) phenograph (Levine ) clustering combined with manual gating to annotate each cluster. Overall, ACDC achieved comparable accuracy (92.9 ± 0.5% for BMMC and 98.3 ± 0.04% for AML) on classifying single-cell events with phenograph clustering (93.6 ± 0.7% for BMMC and 96.5 ± 0.7% for AML) and significantly improved the score-based classification method (78.1 ± 0.03% for BMMC and 68.4 ± 0.1% for AML). We also analyzed the classification performance for each cell type (Fig. 2A and E). In the AML dataset, ACDC achieved a median F1-score of 0.93, compared with 0.84 for the score-based classification and 0.83 for the phenograph clustering. We observed a lower performance of ACDC in the BMMC dataset (median F1-score of 0.60, compared with 0.63 for the score-based classification, and 0.55 for the phenograph clustering) due to the difficulty in detecting rare populations with frequencies less than 0.5%, such as GMP, HSC, MEP and MPP. However, low silhouette coefficients suggest that these rare populations may not form well-defined clusters (Fig. 2B and F and Supplementary Fig. S3). Both the score-based and phenograph clustering methods also failed to identify these rare populations due to a lack of representative data for these cell types.

Fig. 2

Validation on AML and BMMC datasets. (A, E) Classification accuracy of ACDC (yellow bars), score-based classification (purple bars), and phenograph clustering (gray bars) evaluated by F1-score. (B, F) Silhouette coefficients of manually gated populations show cluster tightness. (C, G) Comparison of population frequencies estimated by the 3 methods versus manual gating (green bars). (D, H) Errors in estimating population frequencies. Error bars reflect the standard deviations of the accuracy estimates from the cross-validation trials described in Section 2.3.4

3.3 ACDC estimates frequencies of canonical cell populations as accurate as human experts

We next addressed the practical issue of estimating the frequency of a cell population. When applied to the AML and BMMC datasets, ACDC and the phenograph clustering gave estimates comparable to the manually gated ones while the score-based classification method overestimated the frequency of the unknown population (Fig. 2C and G). To quantify discrepancies between the estimated and manually gated frequencies in all populations, we examined three common metrics: maximum error, Canberra distance, and intersection distance, which measure maximum deviations, the capability of capturing rare populations and accumulative errors respectively. In general, both ACDC and the phenograph clustering estimated the population frequency up to 2% maximum error of manual gating reports and 2–5% error accumulatively on these two datasets (Fig. 2D and H). However, ACDC showed a lower Canberra distance to manual gating, highlighting lower discrepancy for rare populations.

3.4 ACDC captures sample variations in population frequencies

In addition to evaluating the classification accuracy using data collected from one set of samples, we wondered if ACDC captured variations accurately over biological replicates in the PANORAMA dataset (Fig. 3A). We computed correlations between estimated and manually gated frequencies per cell type (Fig. 3B). ACDC achieved an average per-cell type correlation of 0.79, compared to the correlation of 0.71 for the score-based classification and 0.38 for phenograph clustering. Regarding classifying single-cell events, ACDC achieved a median F1-score of 0.88 (Fig. 3C) compared to 0.79 obtained in the original study (Samusik ), though two cell types were omitted due to the lack of defining markers when curating the input table for ACDC (see Methods for full details). These results confirm that ACDC more accurately captures sample variations reflected in the manually gated results.

Fig. 3

Validation on PANORAMA dataset. (A) Frequencies of cellular populations estimated by manual gating (green bars), ACDC (yellow bars), scored-based classification (purple bars) and phenograph clustering (gray bars). All events excluded by manual gating were labeled ‘unknown.’ (B) Per-cell type Pearson correlations over 10 replications. (C) Average F1-scores over 10 replications. Error bars represent standard deviations

3.5 ACDC discovers ambiguous populations from mass cytometry data

One challenge for supervised learning approaches is the limited ability to discover categories not present in the training data. Here we demonstrate that ACDC provides insight on clusters of cells that do not fit into any of the pre-defined cell types. Specifically, 24 clusters of unknown cell types detected from the PANORAMA dataset (Supplementary Fig. S4). We found that one of the unknown clusters showed marker patterns similar to both IgD + IgM+ B-cells and CD8+ T cells (Fig. 4A). This profile suggests the unknown cluster represents some form of lymphoid cells sharing characteristics of B cells and CD8 T cells. We also found a cluster of unknown cell types that shared features of IgD + IgM+ B cells and CD4+ T cells, and cannot be easily categorized into conventional types (Fig. 4B). Though we cannot exclude the possibility these events are doublets that slipped though the pre-gating quality control carried out in (Samusik ) (Supplementary Fig. S5), these results demonstrated that ACDC can highlight ambiguous events that escaped the automated classification for further investigation. However, resolving the biological identity of these events may require utilization of collaborative evidence.

Fig. 4

Illustration of selected unknown clusters. (A) Two-dimensional heatmap shows the profile of an unknown cluster sharing features of CD8+ T cells, IgD + IgM+ B cells and gamma-delta T cells (rows shown below). Colors reflect the marker intensity. (B) Heatmap indicates the profile of an unknown cluster sharing features of CD4+ T cells and IgD + IgM+ B cells (rows shown below). The top-3 similar canonical populations are shown right below the unknown cluster

3.6 Robustness and computational complexity

We evaluated whether ACDC is robust to changes in the parameter tuning. ACDC uses one parameter k to construct nearest neighbor networks for semi-supervised classification. Table 1 shows the classification accuracy evaluated on the BMMC and AML benchmark datasets when setting k to 10, 20 and 30. The results are not sensitive to the parameter k over a 3-fold range.

Table 1

Computational performance of ACDC

	Accuracy (%)			Time (s)			Events
k-nn	10	20	30	10	20	30
BMMC	92.02	92.24	92.49	245	309	376	81747
AML	98.36	98.30	98.25	884	992	1077	103184

Computational performance of ACDC We also examined the computational complexity of ACDC. The most expensive computational step in ACDC is the semi-supervised classification, which involves constructing and inverting a large matrix. In our current implementation, ACDC takes ∼250 and ∼900 seconds to process BMMC and AML benchmarks (). This computation was done on a machine with an Intel® Core™ i7-6700K Processor 3.40 GHz and 16 GB RAM. By comparison, it takes ∼125 and ∼550 s to cluster the BMMC and AML datasets using Phenograph on the same machine.

4 Conclusion

Here we have introduced a new method called ACDC that combines profile matching and semi-supervised learning to automate the analysis and interpretation of mass cytometry data. ACDC takes advantage of biological knowledge to guide learning algorithms and creates a new framework for interpreting data from high-dimensional cytometry. By using biological knowledge as an input for the analysis, we turned the unsupervised problem of data interpretation into a semi-supervised problem of network propagation. Our results suggest ACDC reliably classifies single-cell events and aids discovery of novel cell types. One limitation of ACDC is that each marker label is binary (present or absent). In practice, cell populations of interests are defined by intermediate markers (Guilliams ; Levine ; Ohradanova-Repic ; Rosenblum ). One possible improvement is to extend the Gaussian mixture model and consider multiple states (Chan ; Cron ), and we anticipate this development in a future study. Given the active development of many algorithms to facilitate the processing and analysis of high-throughput cytometry data, recent efforts have also been focused on developing reproducible pipelines and frameworks (Aghaeepour , 2016; Finak ). The introduction of a study-specific table with markers and cell labels offers a new direction toward automatic and reproducible analysis of mass cytometry data. With this easy-to-customize design, the annotation step feeds into cytometry data analysis upfront. This feature allows the cellular determinations to be reproduced or modified easily with a given cell type–marker table. Additionally, flagging ambiguous events help sift through the massive data to guide researchers for follow up on areas of quality control and process improvement, as well as the discovery of biologically relevant cell populations. Currently, our design requires a table specified by the analyst. However, there’s no limit to what information goes into this table. Thus, it’s possible to infer a comprehensive table automatically from the complete biomedical literature mining (Courtot ; Shen-Orr ) or through a targeted query of an immunological database (Courtot ). The community has long recognized the importance of reliable immunophenotyping analysis in flow cytometry (Aghaeepour ; Finak ). Additional efforts to integrate existing tools into shared computational pipelines for better CyTOF processing and cell type enumeration are needed. With the removal of the manual processing steps that currently limit large-scale CyTOF analysis, we envision ACDC as a step toward a new paradigm of reproducible, systematic and objective immunophenotyping that fully embraces high-dimensional datasets for discovery and translation to actionable insights. Click here for additional data file.

27 in total

1. Random walks for image segmentation.

Authors: Leo Grady
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2006-11 Impact factor: 6.226

2. flowCL: ontology-based cell population labelling in flow cytometry.

Authors: Mélanie Courtot; Justin Meskas; Alexander D Diehl; Radina Droumeva; Raphael Gottardo; Adrin Jalali; Mohammad Jafar Taghiyar; Holden T Maecker; J Philip McCoy; Alan Ruttenberg; Richard H Scheuermann; Ryan R Brinkman
Journal: Bioinformatics Date: 2014-12-06 Impact factor: 6.937

3. Cytometry by time-of-flight shows combinatorial cytokine expression and virus-specific cell niches within a continuum of CD8+ T cell phenotypes.

Authors: Evan W Newell; Natalia Sigal; Sean C Bendall; Garry P Nolan; Mark M Davis
Journal: Immunity Date: 2012-01-27 Impact factor: 31.745

Review 4. Regulatory T cell memory.

Authors: Michael D Rosenblum; Sing Sing Way; Abul K Abbas
Journal: Nat Rev Immunol Date: 2015-12-21 Impact factor: 53.106

5. Combinatorial tetramer staining and mass cytometry analysis facilitate T-cell epitope mapping and characterization.

Authors: Evan W Newell; Natalia Sigal; Nitya Nair; Brian A Kidd; Harry B Greenberg; Mark M Davis
Journal: Nat Biotechnol Date: 2013-06-09 Impact factor: 54.908

Review 6. Dendritic cells, monocytes and macrophages: a unified nomenclature based on ontogeny.

Authors: Martin Guilliams; Florent Ginhoux; Claudia Jakubzick; Shalin H Naik; Nobuyuki Onai; Barbara U Schraml; Elodie Segura; Roxane Tussiwand; Simon Yona
Journal: Nat Rev Immunol Date: 2014-07-18 Impact factor: 53.106

7. Towards a cytokine-cell interaction knowledgebase of the adaptive immune system.

Authors: Shai S Shen-Orr; Ofir Goldberger; Yael Garten; Yael Rosenberg-Hasson; Patricia A Lovelace; David L Hirschberg; Russ B Altman; Mark M Davis; Atul J Butte
Journal: Pac Symp Biocomput Date: 2009

8. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE.

Authors: Peng Qiu; Erin F Simonds; Sean C Bendall; Kenneth D Gibbs; Robert V Bruggner; Michael D Linderman; Karen Sachs; Garry P Nolan; Sylvia K Plevritis
Journal: Nat Biotechnol Date: 2011-10-02 Impact factor: 54.908

9. Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples.

Authors: Andrew Cron; Cécile Gouttefangeas; Jacob Frelinger; Lin Lin; Satwinder K Singh; Cedrik M Britten; Marij J P Welters; Sjoerd H van der Burg; Mike West; Cliburn Chan
Journal: PLoS Comput Biol Date: 2013-07-11 Impact factor: 4.475

10. Critical assessment of automated flow cytometry data analysis techniques.

Authors: Nima Aghaeepour; Greg Finak; Holger Hoos; Tim R Mosmann; Ryan Brinkman; Raphael Gottardo; Richard H Scheuermann
Journal: Nat Methods Date: 2013-02-10 Impact factor: 28.547

8 in total

1. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.

Authors: Alexandra J Lee; Ivan Chang; Julie G Burel; Cecilia S Lindestam Arlehamn; Aishwarya Mandava; Daniela Weiskopf; Bjoern Peters; Alessandro Sette; Richard H Scheuermann; Yu Qian
Journal: Cytometry A Date: 2018-04-17 Impact factor: 4.355

2. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA.

Authors: Weiruo Zhang; Irene Li; Nathan E Reticker-Flynn; Zinaida Good; Serena Chang; Nikolay Samusik; Saumyaa Saumyaa; Yuanyuan Li; Xin Zhou; Rachel Liang; Christina S Kong; Quynh-Thu Le; Andrew J Gentles; John B Sunwoo; Garry P Nolan; Edgar G Engleman; Sylvia K Plevritis
Journal: Nat Methods Date: 2022-06-02 Impact factor: 28.547

3. CyAnno: A semi-automated approach for cell type annotation of mass cytometry datasets.

Authors: Abhinav Kaushik; Diane Dunham; Ziyuan He; Monali Manohar; Manisha Desai; Kari C Nadeau; Sandra Andorf
Journal: Bioinformatics Date: 2021-05-25 Impact factor: 6.931

4. Predicting Cell Populations in Single Cell Mass Cytometry Data.

Authors: Tamim Abdelaal; Vincent van Unen; Thomas Höllt; Frits Koning; Marcel J T Reinders; Ahmed Mahfouz
Journal: Cytometry A Date: 2019-03-12 Impact factor: 4.355

5. A comparison framework and guideline of clustering methods for mass cytometry data.

Authors: Xiao Liu; Weichen Song; Brandon Y Wong; Ting Zhang; Shunying Yu; Guan Ning Lin; Xianting Ding
Journal: Genome Biol Date: 2019-12-23 Impact factor: 13.583

6. Machine Learning of Discriminative Gate Locations for Clinical Diagnosis.

Authors: Disi Ji; Preston Putzel; Yu Qian; Ivan Chang; Aishwarya Mandava; Richard H Scheuermann; Jack D Bui; Huan-You Wang; Padhraic Smyth
Journal: Cytometry A Date: 2019-11-05 Impact factor: 4.355

Review 7. CyTOF^® for the Masses.

Authors: Akshay Iyer; Anouk A J Hamers; Asha B Pillai
Journal: Front Immunol Date: 2022-04-14 Impact factor: 8.786

Review 8. Recent Advances in Computer-Assisted Algorithms for Cell Subtype Identification of Cytometry Data.

Authors: Peng Liu; Silvia Liu; Yusi Fang; Xiangning Xue; Jian Zou; George Tseng; Liza Konnikova
Journal: Front Cell Dev Biol Date: 2020-04-28