Literature DB >> 30137247

Impact of similarity metrics on single-cell RNA-seq data clustering.

Taiyun Kim1, Irene Rui Chen1, Yingxin Lin1, Andy Yi-Yang Wang2, Jean Yee Hwa Yang1, Pengyi Yang1.   

Abstract

Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  clustering; correlation; distance; scRNA-seq; similarity metric; single-cell RNA-seq

Year:  2019        PMID: 30137247     DOI: 10.1093/bib/bby076

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  27 in total

1.  scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.

Authors:  Yingxin Lin; Shila Ghazanfar; Kevin Y X Wang; Johann A Gagnon-Bartsch; Kitty K Lo; Xianbin Su; Ze-Guang Han; John T Ormerod; Terence P Speed; Pengyi Yang; Jean Yee Hwa Yang
Journal:  Proc Natl Acad Sci U S A       Date:  2019-04-26       Impact factor: 11.205

2.  Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview.

Authors:  Shaked Slovin; Annamaria Carissimo; Francesco Panariello; Antonio Grimaldi; Valentina Bouché; Gennaro Gambardella; Davide Cacchiarelli
Journal:  Methods Mol Biol       Date:  2021

3.  A single-cell atlas of mouse lung development.

Authors:  Nicholas M Negretti; Erin J Plosa; John T Benjamin; Bryce A Schuler; A Christian Habermann; Christopher S Jetter; Peter Gulleman; Claire Bunn; Alice N Hackett; Meaghan Ransom; Chase J Taylor; David Nichols; Brittany K Matlock; Susan H Guttentag; Timothy S Blackwell; Nicholas E Banovich; Jonathan A Kropski; Jennifer M S Sucre
Journal:  Development       Date:  2021-12-20       Impact factor: 6.868

4.  Shared Differential Expression-Based Distance Reflects Global Cell Type Relationships in Single-Cell RNA Sequencing Data.

Authors:  Aidan Mcloughlin; Haiyan Huang
Journal:  J Comput Biol       Date:  2022-07-06       Impact factor: 1.549

Review 5.  Heterogeneous data integration methods for patient similarity networks.

Authors:  Jessica Gliozzo; Marco Mesiti; Marco Notaro; Alessandro Petrini; Alex Patak; Antonio Puertas-Gallardo; Alberto Paccanaro; Giorgio Valentini; Elena Casiraghi
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

6.  Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans).

Authors:  Alfred Ultsch; Jörn Lötsch
Journal:  BMC Bioinformatics       Date:  2022-06-16       Impact factor: 3.307

7.  Occult polyclonality of preclinical pancreatic cancer models drives in vitro evolution.

Authors:  Maria E Monberg; Heather Geiger; Jaewon J Lee; Roshan Sharma; Alexander Semaan; Vincent Bernard; Justin Wong; Fang Wang; Shaoheng Liang; Daniel B Swartzlander; Bret M Stephens; Matthew H G Katz; Ken Chen; Nicolas Robine; Paola A Guerrero; Anirban Maitra
Journal:  Nat Commun       Date:  2022-06-25       Impact factor: 17.694

8.  Optimal Transport improves cell-cell similarity inference in single-cell omics data.

Authors:  Geert-Jan Huizing; Gabriel Peyré; Laura Cantini
Journal:  Bioinformatics       Date:  2022-02-14       Impact factor: 6.937

9.  Sphetcher: Spherical Thresholding Improves Sketching of Single-Cell Transcriptomic Heterogeneity.

Authors:  Van Hoan Do; Khaled Elbassioni; Stefan Canzar
Journal:  iScience       Date:  2020-05-04

10.  scClassify: sample size estimation and multiscale classification of cells using single and multiple reference.

Authors:  Yingxin Lin; Yue Cao; Hani Jieun Kim; Agus Salim; Terence P Speed; David M Lin; Pengyi Yang; Jean Yee Hwa Yang
Journal:  Mol Syst Biol       Date:  2020-06       Impact factor: 11.429

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.