| Literature DB >> 27183440 |
Nikolay Samusik1, Zinaida Good1,2, Matthew H Spitzer1,2, Kara L Davis1,3, Garry P Nolan1.
Abstract
Accurate identification of cell subsets in complex populations is key to discovering novelty in multidimensional single-cell experiments. We present X-shift (http://web.stanford.edu/~samusik/vortex/), an algorithm that processes data sets using fast k-nearest-neighbor estimation of cell event density and arranges populations by marker-based classification. X-shift enables automated cell-subset clustering and access to biological insights that 'prior knowledge' might prevent the researcher from discovering.Entities:
Mesh:
Year: 2016 PMID: 27183440 PMCID: PMC4896314 DOI: 10.1038/nmeth.3863
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1X-shift algorithm design and validation
(a–c) Workflow of X-shift algorithm (a) Synthetic 2-dimensional dataset with three ‘point clouds’. (b) K nearest neighbors density estimation. Example sets of 20 nearest neighbors are shown for 3 data points. (c) Connecting datapoints against the gradient of density estimate and finding local maxima (d) Testing neighboring populations for density-separation. (e) X-shift clustering of synthetic data. Randomly generated datasets with 10 populations in 15 dimensions, 20 populations in 25 dimensions and 30 populations in 35 dimensions were clustered with X-shift, varying the number of nearest neighbors (K) used for density estimate from 100 to 5. Blue line shows the fitting of the curve using line-plus-exponent regression. (f) Assessment of X-shift performance in automatic parameter-finding mode on 12-color FlowCAP I Normal Donor dataset, compared to FlowCAP I Challenge I submissions[4]. (g) The scheme of evaluation of X-shift performance against hand-gated CyTOF data. (h) X-shift clustering of mouse bone marrow data at various K settings were compared to hand-gates and the median F-measures over 10 biological replicates were plotted as stacked areas. Population labels are positioned to the point where each F-measure first reaches 90% of its maximum. (i) Results of X-shift analysis of bone marrow data when K was automatically selected for each of the 10 replicates. Bars show median values across replicates and error bars represent inter-quartile range.
Figure 2X-shift clustering reveals novel features of mouse hematopoietic differentiation
(a) Clustering of bone marrow replicate #7 with X-shift (K = 20 was auto-selected by the switch-point-finding algorithm) represented in a Divisive Marker Tree. Node radii are proportional to the cubic root of the number of cell events contained at each node. The tree is a nested representation, i.e. parent nodes contain the union of cell events of its children. Labels on nodes show marker cutoff values that define each sub-branch, expressed on the arsinh(x/5) scale. (b) X-shift finds biologically relevant subsets within the hand-gated cell populations (Bone marrow replicate #7, X-shift K = 20). (c) Single-cell Force-Directed Layout of Mouse Bone Marrow #7 (X-shift K = 20, color-coded for 48 clusters). Color code shows X-shift clusters and grey boxes show locations of hand-gated cell populations. (d) Force-directed layout of populations related to monocyte development. Color code represents expression levels of indicated markers. (e) Force-directed layout of populations related to pDC development. Color code represents expression levels of indicated markers.