MOTIVATION: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS: We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS: We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Marilin S Koch; Mykola Zdioruk; Michal O Nowicki; Alec M Griffith; Estuardo Aguilar; Laura K Aguilar; Brian W Guzik; Francesca Barone; Paul P Tak; Ghazaleh Tabatabai; James A Lederer; E Antonio Chiocca; Sean Lawler Journal: J Immunother Cancer Date: 2022-01 Impact factor: 13.751
Authors: Marilin S Koch; Mykola Zdioruk; Michal O Nowicki; Alec M Griffith; Estuardo Aguilar-Cordova; Laura K Aguilar; Brian W Guzik; Francesca Barone; Paul Peter Tak; Katharina Schregel; Michael S Hoetker; James A Lederer; E Antonio Chiocca; Ghazaleh Tabatabai; Sean E Lawler Journal: Mol Ther Oncolytics Date: 2022-07-31 Impact factor: 6.311
Authors: Aaron J Wilk; Madeline J Lee; Bei Wei; Benjamin Parks; Ruoxi Pi; Giovanny J Martínez-Colón; Thanmayi Ranganath; Nancy Q Zhao; Shalina Taylor; Winston Becker; David Jimenez-Morales; Andra L Blomkalns; Ruth O'Hara; Euan A Ashley; Kari C Nadeau; Samuel Yang; Susan Holmes; Marlene Rabinovitch; Angela J Rogers; William J Greenleaf; Catherine A Blish Journal: J Exp Med Date: 2021-06-15 Impact factor: 17.579
Authors: Evan Greene; Greg Finak; Leonard A D'Amico; Nina Bhardwaj; Candice D Church; Chihiro Morishima; Nirasha Ramchurren; Janis M Taube; Paul T Nghiem; Martin A Cheever; Steven P Fling; Raphael Gottardo Journal: Patterns (N Y) Date: 2021-10-27