Hao Jiang1, Lydia L Sohn2, Haiyan Huang3, Luonan Chen4,5. 1. Department of Mathematics, School of Information, Renmin University of China, Beijing, China. 2. Department of Mechanical Engineering, University of California, Berkeley, CA, USA. 3. Department of Statistics, University of California, Berkeley, CA, USA. 4. Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China. 5. CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
Abstract
Motivation: The rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. Identification of intercellular transcriptomic heterogeneity is one of the most critical tasks in single-cell RNA-sequencing studies. Results: We propose a new cell similarity measure based on cell-pair differentiability correlation, which is derived from gene differential pattern among all cell pairs. Through plugging into the framework of hierarchical clustering with this new measure, we further develop a variance analysis based clustering algorithm 'Corr' that can determine cluster number automatically and identify cell types accurately. The robustness and superiority of the proposed algorithm are compared with representative algorithms: shared nearest neighbor (SNN)-Cliq and several other state-of-the-art clustering methods, on many benchmark or real single cell RNA-sequencing datasets in terms of both internal criteria (clustering number and accuracy) and external criteria (purity, adjusted rand index, F1-measure). Moreover, differentiability vector with our new measure provides a new means in identifying potential biomarkers from cancer related single cell datasets even with strong noise. Prognosis analyses from independent datasets of cancers confirmed the effectiveness of our 'Corr' method. Availability and implementation: The source code (Matlab) is available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/Corr--SourceCodes.zip. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: The rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. Identification of intercellular transcriptomic heterogeneity is one of the most critical tasks in single-cell RNA-sequencing studies. Results: We propose a new cell similarity measure based on cell-pair differentiability correlation, which is derived from gene differential pattern among all cell pairs. Through plugging into the framework of hierarchical clustering with this new measure, we further develop a variance analysis based clustering algorithm 'Corr' that can determine cluster number automatically and identify cell types accurately. The robustness and superiority of the proposed algorithm are compared with representative algorithms: shared nearest neighbor (SNN)-Cliq and several other state-of-the-art clustering methods, on many benchmark or real single cell RNA-sequencing datasets in terms of both internal criteria (clustering number and accuracy) and external criteria (purity, adjusted rand index, F1-measure). Moreover, differentiability vector with our new measure provides a new means in identifying potential biomarkers from cancer related single cell datasets even with strong noise. Prognosis analyses from independent datasets of cancers confirmed the effectiveness of our 'Corr' method. Availability and implementation: The source code (Matlab) is available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/Corr--SourceCodes.zip. Supplementary information: Supplementary data are available at Bioinformatics online.